I got interested in blockchains and started searching for a relevant tutorial. There are many blockchain tutorials (videos and blogs), but I found most of them either too watered down and cartoonish or just describing the mechanics of blockchains without describing the rationale for the design decisions.
So, after reading a few of these tutorials, I think I finally gained a better understanding of what problem blockchains are trying to solve and how they do it. I am by no means an expert in the field, but I decided to write this article because:
- Writing about a topic, typically helps one to understand it better
- I can refer to it latter to recall these concepts
- Someone else with an engineering background may find this description useful
What Problem Blockchain is Trying to Solve
Suppose we want to create a list of immutable things (e.g., financial transactions) which everyone can trust its immutability. We can write this list on a piece of paper and give it to a person whom we trust. Every time we need to add something to the list, we ask the trusted person to add the new entry to the list.
Problems with this approach:
- Finding a trustworthy person who is trusted by everyone involved can is hard
- The trusted entity may become unavailable for various reasons (single point of failure)
A distributed approach where the same list is maintained by multiple entities seems a natural solution which does not have the problems mentioned above. However, the distributed approach has problems of its own:
- How to ensure immutability if the list is maintained by random entities who are not necessarily trustworthy?
- What incentive these random entities have in maintaining the immutable list?
How Blockchain Solves These Problems
The goal is to create a block that contains the data (that we want to make sure is not tampered with) and metadata which helps with making the data tamper-proof. Adding a hash that is computed on the data by itself does not protect the data against tampering because a bad actor can simply change the data and then re-compute the hash based on the modified data and everything checks out. This is shown in the figure below.
Now, suppose we add another metadata field which is the hash computed on the previous block.
After linking block A and block B together as shown in the figure above, if a bad actor attempts to modify the transaction field in block A, they have to re-compute block A’s hash and, therefore, the “hash of previous” field of block B. This, in turn, forces the bad actor to re-compute hash of block B as well (note that hash of block B is computed on transaction field of B and “hash of previous” field in block B). By induction, we can see that modifying block A has a ripple effect on the blocks that come after block A.
Computing a hash is not prohibitively expensive, so requiring to re-compute hash values for what comes after block A is not particularly burdensome. In other words, we still have not achieved our goal of making the blocks immutable. To that end, another field, namely, a Nonce field, is added to each block. The goal of including a Nonce field is to make it difficult to tamper with a large set of blocks.
Making Things Intentionally Difficult
Nonce (aka “Proof of Work”) is a 32-byte number with the following property:
SHA256(Transaction | Nonce | Hash_of_previous ) = 00…000xxxxxx
SHA256 is a hash function with desirable properties and is described here. Here the number of leading zeros is specified by the specific blockchain protocol and is referred to as “difficulty” of the blockchain algorithm.
There is no closed form solution to solve this equation for Nonce because hash functions are designed to be one-way (i.e., given an input, it is easy to compute the hash value, but going the other way is not possible and ill-defined). The only way to find a value of Nonce which satisfies the equation above is to try different values of Nonce and examine the output of the hash to make sure that the hash value has at least the desired number of leading zeros. The higher the number required leading zeros, the longer it takes to find a Nonce that satisfies this equation (hence, the name “difficulty”). For Bitcoin protocol, the difficulty is adjusted so that on average it takes about 10 minutes to find such Nonce value.
At first glance, the requirement to include a Nonce with the property mentioned above may sound strange. Why do we want to waste computer time to compute a Nonce with this arbitrary property? The answer is related to what we discovered in the previous section. Specifically, recall that tampering with the data in any block forces the bad actor to re-compute all the hash values for subsequent blocks. Re-computing the hash for a block changes the input to the hash function for the subsequent block. This, in turn, requires a new Nonce value to be computed. For a long blockchain, re-computing a large number of Nonce values that satisfy the “Proof of Work” equation will take a long time and consume a significant amount of computer resources.
In addition, even if the bad actor succeeds in re-generating all the Nonce values for the subsequent blocks, they still need to broadcast the modified blockchain to other entities who participate in validating the blockchain. Acceptance of a new blockchain is based on consensus, therefore, the bad actor will not succeed in getting the tampered block(s) accepted.
In practice, a block in the blockchain contains an index and a timestamp field as well and both of them are also inputs to the hash function.
What is Mining?
When a user submits a transaction to be added to blockchain (e.g., Bob gave Alice two Bitcoins), the transaction is added to a list of unverified transactions. A miner process, grabs some transactions from this list and adds an additional transaction (as the first transaction in the block) which rewards the miner itself with a certain number of Bitcoins (e.g., 12.5 BTC). This reward seems unusually generous for about 10 minutes worth of computer time (which is the average amount of time that it takes to compute the Nonce for a block). But, the system needs only a single valid block, and therefore, only the block from the first miner who constructs the block and publishes it will be accepted. The energy and time that is consumed by all the other miner is wasted.
So, the real mining reward is ‘12.5*probabilty_of_block_acceptence’ bitcoins. Given that there are many mining farms with very fast GPUs all competing to earn this reward for each block, the expected value of this mining reward is relatively low and may not even cover the cost of electricity in the region where the mining occurs. This site has a calculator for computing mining profitability.
Now we know what blockchains do and how they work. It is very easy to build your own blockchain experimenting with it. This is a great tutorial in python.
Blockchain: A Quick Primer was originally published in Data Driven Investor on Medium, where people are continuing the conversation by highlighting and responding to this story.