Table of Links
-
Preliminaries
-
Selfish Mining Attack
2 PRELIMINARIES
2.1 System Model
We define a formal model for blockchain systems which we will consider throughout this work. The model describes how block mining proceeds and is parametrized by several parameters. Different parameter value instantiations then give rise to formal models for blockchains based on PoW, PoStake, or PoST protocols. We assume the miners in the blockchain protocol are either adversarial or honest. Honest miners all follow a prescribed protocol, whereas the adversarial miners work (i.e. pool resources) together in a coalition.
System model. We assume that block mining proceeds in discrete time steps, as in many discrete-time models of PoW [17, 24] and PoStake [9] protocols. For a miner that owns a fraction π β [0, 1] of the total resources in the blockchain and that can mine up to π > 0 blocks at any given time step, we define (π, π)-mining as follows: the probability of mining a block at the given time step is proportional to π Β· π, and the maximum number of blocks which are available for mining is π. One can think of π as some further constraints on the number of blocks a miner can mine on at any given point in time, e.g. due to the number of VDFs they own in PoST. Hence, (π, 1)-mining corresponds to mining in PoW based blockchains, (π, π)-mining for π < β to mining in PoST based blockchains with π VDFs, and (π, β)-mining to mining in PoStake based blockchains
Adversarial and broadcast model. Let π β [0, 1] denote the fraction of resources in the chain owned by the adversarial coalition. We assume the adversarial coalition participates in (π, π)-mining, and the honest miners participate in (1 β π, 1)-mining, where the only block the honest miners mine on at any time step is the most recent block on the longest public chain. When there is a tie, i.e., two longest chains are gossiped through the network, the honest miners will mine on the chain which is gossiped to them first. In such situations, πΎ β [0, 1] denotes the switching probability, i.e., the probability of honest miners switching to the adversaryβs chain.
2.2 Selfish Mining Objective
Chain quality. Chain quality is a measure of the number of honest blocks in any consecutive segment of the blockchain. A blockchain protocol is said to satisfy (π, β)-chain quality if for any segment of the chain of length β, the fraction of honest blocks in that segment is at least π [16].
Selfish mining objective. Let π denote an adversarial mining strategy. The selfish mining objective we analyse in our work is the expected relative revenue of the adversary. Formally, let revenueA and revenueH denote the number of adversarial and honest blocks in the main chain respectively. The expected relative revenue (ERRev) of the adversary under strategy π is defined as
Note that the chain quality of the blockchain under an adversarial mining strategy π is simply 1 β ERRev(π), hence our selfish mining objective captures the expected change in the chain quality.
2.3 Markov Decision Processes
As mentioned in Section 1, we will reduce the problem of computing optimal selfish mining strategies for the expected relative revenue objective to solving MDPs with mean-payoff objectives. In what follows, we recall the necessary notions on MDPs and formally define the mean-payoff objectives. For a finite set π, we use D (π) to denote the set of all probability distributions over π.
Markov decision process. A Markov Decision Process (MDP) [26] is a tuple M = (π, π΄, π, π 0) where
β’ π is a finite set of states, with π 0 β π being the initial state,
β’ π΄ is a finite set of actions, overloaded to specify for each state π β π the set of available actions π΄(π ) β π΄, and
β’ π : π Γ π΄ β D (π) is a transition function, prescribing to each (π , π) β π Γ π΄ a probability distribution over successor states.
A strategy in M is a recipe to choose an action given a history of states and actions, i.e. it is a function π : (π Γ π΄) β Γ π β D (π΄). In general, strategy can use randomization and memory. A positional strategy uses neither randomization nor memory, i.e. it is a function π : π β π΄. We denote by Ξ£ and Ξ£ π the set of all strategies and all positional strategies in M. Given a strategy π, it induces a probability measure P π [Β·] in M with an associated expectation operator E π [Β·].
Mean-payoff objective. A reward function in M is a function π : π Γπ΄ Γ π β R which to each state-action-state triple prescribes a real-valued reward. Given an MDP M, a reward function π and a strategy π, we define the mean-payoff of π with respect to π via
where ππ = ππ (π π, ππ, π π+1) is the reward incurred at step π.
The mean-payoff MDP problem is to compute the maximal mean-payoff that a strategy in the MDP can achieve. A classic result in MDP analysis is that there always exists a positional strategy π β that achieves maximal mean-payoff in the MDP, i.e. there is π β β Ξ£ π such that MP(π β ) = maxπ βΞ£ π MP(π) = supπ βΞ£ MP(π) [26]. Furthermore, the optimal positional strategy and the mean-payoff that it achieves can be efficiently computed in polynomial time [15, 26].
Authors:
(1) Krishnendu Chatterjee, IST Austria, Austria ([email protected]);
(2) Amirali Ebrahimzadeh, Sharif University of Technology, Iran ([email protected]);
(3) Mehrdad Karrabi, IST Austria, Austria ([email protected]);
(4) Krzysztof Pietrzak, IST Austria, Austria ([email protected]);
(5) Michelle Yeo, National University of Singapore, Singapore ([email protected]);
(6) ΓorΔe Ε½ikeliΔ, Singapore Management University, Singapore ([email protected]).
This paper is