Learn

|

What is a hash function?

What is a hash function?

The mathematical functions that help Bitcoin work

Hash functions are mathematical functions, which are used in cryptography, information security, and finance. Due to their unique properties and reliability, they’re used in Bitcoin for security, privacy, and in Bitcoin mining operations.

A hash function takes an input and produces an output (called a “hash”). The input is information of any size, while the output will always be of a fixed-length, typically as a sequence of numbers and letters.

It’s kinda like a machine, where raw materials are the inputs and a standard-sized finished product is the output.

The specific ways that hash functions take inputs to produce hashes, as well as the unique relationships between inputs and hashes, have led to their widespread use across different industries. Here’s a breakdown of the important features of hash functions:

  1. Variable input, fixed output: Regardless of the length of the input – whether it's a single character or an entire book – the hash that’s produced will always be the same length. This uniformity of outputs is crucial for predictability in storage and processing.
  2. Deterministic: Given a particular input, a hash function will always produce the exact same hash, no matter how many times you repeat it. This repeatability is useful for verification purposes, as any change to the input, such as changing a single letter in a book, will result in a different hash.
  3. One-way: Hash functions are designed to be one-way, meaning you can use an input to produce a hash, but you can’t take a hash and figure out what was its input. While the process of generating a hash from an input is fast and easy, the reverse process of generating the input from a hash is for all intents and purposes impossible.
  4. Unpredictability of the hash: It's impossible to predict the hash of a given input without actually processing it through the hash function. This unpredictability is a security feature, as it prevents potential attackers from deducing the input just by looking at the hash.
  5. Collision-resistant: Two different inputs should not produce the same hash, which is known as a “collision.”
  6. Creation vs. verification asymmetry: Since the hash is unpredictable, creating a hash with specific characteristics (like a hash that begins with 5 zeros: “00000”) is computationally intensive and time-consuming. The only known way to do this is with trial and error, by simply running the hash function over and over again with new inputs. However, once an input has been found that produces a desired hash, it’s quick and easy to verify that the input corresponds to the output. This asymmetry between producing a specific hash and verifying its validity is particularly important in systems that rely on proof-of-work (Bitcoin mining).

You can think of it like a city library that prints a unique stamp on each of its books, which completely captures all the information of that book. Even though some books are short-stories (small input data) and others are long novels (large input data), the stamp (aka the hash) is always the exact same size. Identical copies of a book will always have identical stamps, but changing even a single letter in a book will completely change its stamp. What’s more, it’s super easy to check if a book and a stamp match, but if you only have a stamp there’s no way to determine what book it belongs to.

How does Bitcoin use hash functions?

Bitcoin uses hash functions in its operations, including for Bitcoin mining, information integrity, and generating Bitcoin addresses.

The most commonly used hash function in Bitcoin is called “SHA-256” (aka Secure Hash Algorithm 256-bit). This hash function is an industry-standard function used widely in banking, information security, and communication networks. In all likelihood, if you click your web browser’s 🔒 icon next to the website’s URL and view the security certificate of the website, you’ll find it’s using SHA-256 to secure your connection.

Below are the different ways hash functions are used in Bitcoin.

Bitcoin mining

Bitcoin mining is a process that serves the dual-purpose of processing transactions and issuing new bitcoin into circulation. It involves computers (aka miners) using their computational power to produce hashes (aka hashpower), to compete to add the next block of transactions to Bitcoin’s blockchain and get paid a reward of new bitcoin for doing so. It’s a system that democratizes the updating of Bitcoin’s record of transactions, without relying on any central authority.

How does it work?

Miners take inputs to try and produce a hash with specific properties (a certain number of zeros at the start), which is set by the Bitcoin software. The input includes both fixed and variable information:

  1. Fixed information: The Bitcoin software version, the previous block’s hash, the hash of the latest block’s transactions (aka a “Merkle root”), a timestamp, and the required difficulty setting set by the Bitcoin software.
  2. Variable information: A number, called a “nonce.”

Since hashes are unpredictable, the only way to produce a desired hash is through trial and error, by re-running the hash function with a new variable nonce. The first miner to produce a valid hash broadcasts their finding and wins the right to add the next block to the blockchain. Although it’s hard to produce the desired hash, it’s trivially easy to check if the hash is valid, thus making verification an easy process.

The Bitcoin software automatically adjusts the difficulty of this hashing process by changing the number of required leading zeros for the target hash to ensure new blocks are added on average every 10 minutes.

Ensuring blockchain integrity

Hash functions are used to ensure the integrity of the data within each block. Every block in the blockchain contains the hash of the preceding block, which itself contains the hash of its preceding block, and so on, forming an unbroken “chain” of blocks (aka blockchain) going all the way back to Bitcoin’s beginning.

Altering information in any previous block would dramatically change its hash, and consequently, the hash of every subsequent block since they are all connected through sequential hashes.

This makes tampering with block data essentially impossible, as any changes would be easily detectable and immediately rejected by all participants in the network.

Ensuring transaction integrity

Hash functions are used to maintain transaction data integrity within each block. This is done using a data structure called a “Merkle tree,” which is a way to structure and organize hashes.

Here’s the basics of how transactions are hashed into Merkle trees:

  1. Hash transaction: Each transaction within a Bitcoin block is hashed.
  2. Pair hashes: The individual transaction hashes are then paired, hashed together.
  3. Repeat pairing: The process of pairing and hashing is repeated over and over, using the resulting hashes from the previous step.
  4. Create a Merkle root: The iterative process continues until one final hash remains, called the “Merkle root,” which is stored in the block’s header.

The “Merkle root” represents a complete summary of all the block’s transaction data. It’s a quick and space-efficient way to summarize data and it can be used to verify whether a specific transaction is included in a block, without needing all of the block’s data. It can also prevent tampering, as any tiny change to transaction data will completely change that transaction’s hash and all subsequent hashes, including the Merkle root.

Handling private and public keys

A private key is a secret, randomly generated numerical code used to sign transactions, providing proof of ownership and control over the bitcoin associated with a specific Bitcoin address. This private key is put through a series of cryptographic transformations, including hashing, to produce a public key. This process ensures that while the public key can be freely shared, the private key, crucial for accessing and transferring bitcoin, remains secret and infeasible to derive from the public information.

Generating Bitcoin addresses

Bitcoin uses a combination of SHA-256 and another hash function called RIPEMD-160 to generate Bitcoin addresses. A user’s public key, which is derived from their private key, is first hashed using SHA-256, then hashed again using RIPEMD-160. This two-step process, often coupled with a checksum for error checking, generates the Bitcoin address which is what is used for sending bitcoin.

Read more

Lightning

What is the Lightning Network?

The global bitcoin payments network that’s instant, private, and low-to-no cost