Byzantine Fault Tolerance
Intro To Consensus and Byzantine
Terminology used will be pertaining to the blockchain and may not be representative of distributed systems as a whole, where such generalities are made, this article will explicitly state the term `distributed system`.
A distributed system cannot have a single point of failure. If a single node in a distributed system fails, the network as a whole must be able to complete its objective. This means that each node must effectively act independently of each other.
States and Transitions
A system that can remember events that have happened in the past is known as stateful. In this way, a human can be seen to be stateful as they have memories. These memories/ events that have happened in the past, arguably has changed the person. For example, if Bob did not pass his driving test, he would become sad.
State: Bob is Nervous
Event: Bob passes driving test
State: Bob is Happy
State: Bob has 0 NEO to spend
Event: Alice sends Bob 10 NEO
State: Bob has 10 NEO to spend
Moving from one state to another is called a state transition. In the example with Bob, changing emotions is in most cases harmless. In the case of a blockchain, we do not want the state to be easily changeable. Moreover, since prerequisite one also states that nodes must act independently, we also need all nodes to arrive at the same globally consistent shared state. This article will not discuss liveness and safety.
Clearly we need a way to monitor these state transitions. Transactions are the objects/events that change the state, and in NEO the transactions are batched into blocks. This is then analogous to saying that we need a way to monitor the blocks, only accepting blocks that are valid.
A natural question would be to ask, valid according to what set of rules?
These rules are known as the consensus protocol, any transaction that does not conform to this protocol will be denied access to change the state of the chain. This article is not about consensus and so we will not go into it in much depth.
Rules and Assumptions
In order to make rules, we must first have assumptions to which the rules are built upon. For example, the rule that we should not murder is based upon the assumption that no-one has the right to decide who should die. This rule ensures that people are not killed on a daily basis, because if you try to murder someone there are repercussions. What if this assumption was fundamentally wrong? What if people had the right to decide who dies? Then this rule would no longer be able to ensure that people were not killed on a daily basis.
This is the same with the consensus protocols. There are assumptions made to which the rules are made, and if the assumptions are weak, the consensus protocol will fall apart. Without a strong set of rules, we will have no safety against state transitions.
But despite some people today thinking they have the right to murder, this this rule seems fairly intact; I am not fearing for my safety when I leave my house. We can all agree then that there is a certain number that when passed, will no longer be able to guarantee your safety. This number that invalidates our assumptions in this example is not important. In a blockchain however, this number is seldom not mentioned as once this number is passed, the consensus protocol will no longer be able to function as intended. Depending on how strong the assumption was, problems could range from stalling consensus to double spending. This number will be called the threshold ratio.
But wait, what if a deadly virus infected a large amount of people and caused them a lot of agony? Here is a special case, where we can diagnose why the problem occurred in the first place and even though consensus would be broken, we would be able to vaccinate the population and consensus would be restored.
Now that the prerequisites are over, we will talk about a certain property that the consensus protocols have.
Byzantine Fault Tolerance
As we discussed earlier, a consensus protocol can function smoothly as long as the assumptions are not broken by a large number of nodes.
A byzantine failure is a failure that covers anything that could go wrong with the node, this could range from crashing to the node being hacked. A hacked node can pretend to operate as a normal node by sending valid messages, whilst infecting other nodes in the system. In NEO the assumption is that no more than 34% of nodes are dishonest. The threshold ratio is therefore 34%.
Each block must receive approval from 66% of the consensus nodes, or a new speaker is chosen. If more than 34% of nodes are hacked, then they will be able to stall the consensus algorithm by colluding and preventing this 66% number from being reached.
A use-case of Byzantine Fault Tolerance is the application of Delegated Byzantine Fault Tolerance in the NEO Blockchain.