Based of COMS 6998-006: Foundations of Blockchains by Tim Roughgarden.
Overview
These notes focus on the science and technology behind blockchains and the applications built on them. We won’t dive into the line-by-line details of smart contract implementation. Instead, the focus is on the core principle: consensus (think safety and liveness).
Why does this matter? You’ll be able to intelligently evaluate and compare blockchain applications—cut through all the marketing BS.
We might be watching new computer science emerge in real time. Cryptography, distributed systems, game theory, and finance are coming together. Chris Dixon’s Read Write Own paints a compelling picture of the impact this technology could have. Messari’s yearly reports also break down blockchain applications, including their 2024 theses.
Blockchain isn’t just about cryptocurrencies. It's a programmable computer no one owns, with no gatekeepers. You don’t need permission to use it. Cryptocurrency is just a tool to reward those who help maintain the protocol—it’s not the end goal.
Blockchain Stack
One representation of blockchain is that it's a stack of four layers:
Layer (3) – Application:
This is where user-facing applications and smart contracts live. Think of services like Coinbase, Uniswap, and OpenSea. Developers build decentralized applications (dApps) here, leveraging the layers below.
Layer (2) – Scaling:
Expand the capabilities of Layer 1, enhancing its capacity and efficiency to handle more transactions. They aim to make the system faster and more scalable without compromising security.
Layer (1) – Consensus and Compute:
This is the heart of the blockchain. It's responsible for keeping all the nodes in sync and ensuring that once consensus is reached, instructions are executed. Examples include Ethereum and Bitcoin. This layer determines the order of transactions (like smart contract calls) and updates the global state accordingly.
Layer (0) – The Internet
At the foundation is the Internet itself—a reliable way for untrusted parties to communicate directly.
Note that this is just one taxonomy that's constantly in flux.
Digital Signature Schemes
Digital signatures keep transactions secure on blockchains by verifying data without needing trust. They work through three algorithms:
- Key Generation
def key_generation(random_seed):
sk = derive_private_key(random_seed) // Derive private key from the seed
pk = derive_public_key(sk) // Derive public key from the private key
return (pk, sk) // Return key pair
- Signing
def signing_algorithm(message, private_key):
sig = sign_message_with_private_key(message, private_key) // Create signature using private key
signed_message = (message, sig) // Combine message with the signature
return signed_message // Return the signed message (message, sig)
- Verification
def verification_algorithm(message, public_key, signature):
valid = verify_signature(message, public_key, signature) // Check if the signature is valid for the given message and public key
if valid:
return "yes" // The signature is valid
else:
return "no" // The signature is not valid
Only the person with the private key can sign a message, but anyone with the public key can verify it.
Digital signatures lets you prove a message is legit. You sign it with a private key, and anyone can check it's real with the matching public key. The assumption is, without the private key, forging a signature is basically impossible in a well designed system. In theory, you could brute-force the private key. Try every possible key until one works. But with long enough keys, it would take longer than the universe’s lifetime. This works because attackers have limited computing power, and the number of possible keys grows exponentially with key length.
Brute force isn’t the only threat—some algorithms can solve hard problems fast, but we assume things like cracking private keys can’t be done efficiently, and the odds of guessing one by luck are so small they don’t matter. So, even if an attacker has a pile of signed messages, they have almost no chance of forging a new one.
Digital signatures aren’t just useful for signing off on transactions—they're key for how nodes agree in a blockchain. They prevent one node from lying about what another node said. If every message is signed, no one can make up fake messages. Nodes can only forward messages that have been signed by the original sender.
The SMR Problem
The state machine replication (SMR) problem arises in databases and blockchains, where the goal is to keep all nodes in sync by ensuring they follow the same sequence of changes—whether in database operations or blockchain transactions.
A classic example is managing a replicated database. Take IBM, which promises 99.999% uptime. To ensure this, they store copies of the database on multiple machines. The challenge is keeping these copies in sync: if a customer updates one, the others must update too, so all queries return the same result, no matter which machine is used.
In this context, "state" refers to the contents of a database, with each update changing that state. In blockchains, "state" similarly refers to the current balances of accounts or the status of smart contracts. Every transaction alters this state, like transferring funds between accounts.
Unlike IBM, where replication is about uptime, blockchains replicate data to maintain decentralization. No single entity controls the system—it’s distributed across many machines.
Here’s how it works:
- A set of nodes runs a consensus protocol, while clients submit transactions to these nodes.
- Each node maintains an append-only history of transactions. The order of these transactions matters—for instance, in a blockchain, a double-spend attempt must be resolved by deciding which transaction happened first.
The goal of SMR is to ensure that all nodes agree on this history. To achieve that, each node follows a protocol—a set of rules that manage computations and communication between nodes. Each node:
- Maintains local state and performs computations based on that state.
- Receives messages from other nodes and clients.
- Sends messages to other nodes.
This process is event-driven: when a node receives a message, it responds by performing calculations or sending new messages. The protocol ensures that, in the end, all nodes agree on the same transaction history.
For a protocol to solve the SMR problem, it must guarantee:
- Consistency: All nodes must agree on the transaction history. A node can lag, as long as it catches up and no two nodes disagree on transaction order. This prevents conflicting views of the system’s state.
- Liveness: Every transaction submitted to a node is eventually added to all nodes' histories, ensuring continuous progress.
Dolev-Strong Protocol
Some machines may fail or act maliciously when trying to keep multiple devices in sync. A protocol is needed to handle these failures. The Dolev-Strong Protocol ensures all nodes agree on the system’s state, even if some try to cause problems. This is crucial in systems like blockchains, where dishonest nodes can disrupt the process.
For the Dolev-Strong Protocol to work, it relies on a few key assumptions. These assumptions simplify the problem and make consensus possible:
- Fixed Nodes: The set of nodes is fixed and known upfront. Each node has a predefined identity and can talk to every other node.
- Public Keys: Each node has a public-private key pair, and everyone knows the public keys at the start.
- Synchronous Clock: All nodes share a global clock, and every message arrives within one time step.
- Limited Byzantine Nodes: There’s a set limit on how many faulty or malicious nodes (called Byzantine nodes) the protocol can handle, up to “f.”
Faulty/Byzantine Nodes
An honest node follows the protocol without deviation. A faulty node, on the other hand, deviates from the protocol, whether by mistake or on purpose.
There are different types of faulty nodes, ranging from benign to dangerous:
- Crash faults: The node just stops working. Imagine someone pulling the plug on it. Up until the crash, it follows the protocol, then it simply vanishes from the network.
- Omission faults: The node skips sending certain messages. This could be intentional or due to network delays, but the node doesn’t create fake messages—just omits some.
- Byzantine faults: These are the worst. A Byzantine node can do anything—send fake messages, send different messages to different nodes, and behave completely unpredictably. These faults are the hardest to handle, especially in systems like blockchains.
Protocols must be designed to tolerate Byzantine faults, allowing the honest nodes to continue functioning even when a few nodes go rogue. Typically, a protocol assumes that up to “f” nodes can be Byzantine and still work properly. If more than “f” nodes go bad, all bets are off.
Lazy SMR Protocol
What it is: The lazy SMR protocol assumes that a leader (a node) simply broadcasts its decision, and all other nodes agree without question. It’s like having one person always make the decision for everyone, without cross-checking.
Why it fails: This approach fails because if the leader is faulty or malicious (a Byzantine node), it can send different or false decisions to different nodes. Without a way for nodes to verify what others received, they can’t reach consensus. The system becomes vulnerable to a single bad actor.
Round Robin Protocol
What it is: To avoid relying on a single leader, this method rotates the leader role among nodes. Each node takes turns proposing a decision, and others vote on it. This spreads responsibility and reduces the risk of one faulty leader causing problems over time.
Why it fails: If Byzantine faults are present, a malicious leader in its turn can still disrupt consensus by sending different values to different nodes. Even with leader rotation, the protocol lacks mechanisms to detect or correct this misbehavior.
Byzantine Broadcast Problem
The Byzantine Broadcast Problem deals with how to get all honest nodes in a distributed system to agree on the same message, even when up to f nodes are faulty or malicious. The goal is simple: make sure every honest node receives the same message.
The difficulty comes from the fact that faulty nodes can:
- Send different messages to different nodes.
- Pretend to be other nodes, unless messages are authenticated.
- Collaborate with other faulty nodes to mislead the honest ones. Despite these challenges, the protocol has to ensure that all non-faulty nodes end up with the same, correct message, regardless of what the faulty nodes do.
The more formal way to define it is:
System Model:
- A network of nodes, where up to nodes may be Byzantine faulty ().
- Communication is synchronous; messages sent at time are received by time .
- Nodes have unique identifiers and can communicate directly with each other.
Roles:
- Sender: A designated node with a private input value from a set .
- Non-senders: The remaining nodes.
Objective:
Design a protocol satisfying the following properties:
- Termination: Every honest node eventually decides on an output value .
- Agreement: For all honest nodes and , .
- Validity: If the sender is honest, then for all honest nodes , .
Dolev-Strong
The insight of the Dolev-Strong protocol is using signature chains to catch Byzantine nodes trying to cheat. When a node receives a value, it needs more than just the sender's signature to believe it. The protocol builds up trust gradually over time.
Why this works: When a node sees a legitimate value, it adds its own signature and shares it around. As time passes, nodes expect to see more signatures on legitimate messages. By the end, a message must collect signatures from enough different nodes to prove it's real.
Similar to gathering witnesses, the more people independently confirming that they saw the same thing, the more believable it is. Signature chains are just independent confirmations. This makes it impossible for the Byzantine sender to trick the nodes. And because the protocol runs in the synchronous model where messages arrive on time, honest nodes always have enough time to share what they've learned with each other.
There is a more technical writeup here.
On Impossibility Results
The FLM impossibility result shows that under certain assumptions, no Byzantine broadcast protocol can guarantee all three properties: termination (every node eventually makes a decision), agreement (all nodes arrive at same decision), validity (decision reflects honest nodes' inputs).
There are two core elements to the proof:
- Simulation: Adversaries can mimic honest nodes.
- Indistinguishability: Honest nodes cannot differentiate between scenarios.
Just like Turing's halting problem proof, impossibility results: define what’s achievable (which compromises are unavoidable) and guide protocol design trade-off.