Motivation
Some cryptography based applications (such as cryptocurrencies) rely on the secure storage and usage of some secret, known as a cryptographic key. With more and more applications these days that make use of cryptography (especially as part of permissionless blockchains), users are required to maintain an increasing number of keys. Storing each key separately can cause a lot of mess and pain so instead another approach can be considered. We can have a single “seed” (which must be kept secret!) with which we can derive a large number of seeminly “random” keys. To keep keys organized, this derivation mechanism can be hierarchical, for example from this single seed we can want to be able to derivate multiple “sub-seeds”. Each of these sub-seeds will be used to derive cryptographic keys for a different application (for example a different cryptocurrency). You can also think of another hierarchy level where each of these “application-subseeds” will derivate verious “account-subseeds” so you can even have multiple accounts for each application. While the semantics here can change depending on the exact use case, the core idea is that from a single key, we want to deterministically derive large quantities of secrets in a hierarchical fashion. The terms “seed” or “subseed” will be substituted with just the term “key” for convenience.
These srtuctures are usually referred to as HD-wallets, since their most popular use case is for cryptocurrencies’ wallets. In this post we will see how can this hierarchical structure is generated and the different flavors of HD-wallets with their respective security considerations.
In this post I will assume that the reader is at least somewhat familiar with elliptic-curve public-key cryptography.
The Setting
First, let’s go over how HD wallets work, for exact definitions please refer to BIP-32.
Since we are trying to create a hierarchical structure, we can think of all the keys generated by this mechanism to be part of this hierarchy. Hierarchical structures usually visualized using trees. The single “seed” will be the root of the tree and each key from it derived can be used to the derive all the keys in its subtree. For example, consider the following figure:
Figure 1: An Example of a Key Derivation Tree
Starting from a key $k_0$ (the root of the tree, what we originally referred to as a “seed”) we can derive multiple keys $(k_1,k_2,k_3)$. These keys can also be used to derive even more keys, for example, in the figure $k_1$ was used to derive keys $k_4,k_5$. We also use the term “subkey” to tell that one key was derived from another, for instance $k_4$ is subkey number 1 of $k_1$ and $k_3$ is subkey number 2 of $k_0$. We will also use the $k[i]$ notation to denote the $i^{th}$ subkey of key $k$.
So far I’ve used the term “key” in a vague manner, this is because both private and public keys can be considered for this use case. While the scenario of a private key seems more “natural” (so the owner of a single key can generate as many addresses as he wants), the scenario of deriving public keys can also be useful. Let’s consider a real life example: Your favorite coffee shop decided to start accepting Bitcoin. To make an order, the customer is standing in front of a touch screen, selecting his drink from the menu and hitting the “Payment” button. At this point the machine has to present the client a unique payment address. We don’t want two clients to use the same payment address because it would be much more complicated to distinguish between payments of different clients. Instead, the machine is able to generate a fresh address for each order and present it to the customer. However, we don’t the machine to be able to derive the matching private keys as well as it would impose a security risk. So, the machine will be able to derive the public addresses to which customers pay but the owner of the store will be able to derive (in the same manner) the matching private keys which will be used to redeem the paid funds.
Notice that the public keys generated by the payment terminal should correspond to the private keys generated by the owner of the store.
How Does It Work?
So lets think of $f$ as the function generating subkeys, therefore, one could generate the $i^{th}$ subkey of a key $K$ using $f(K,i)$. However, this doesn’t work so naively, because if we have a private key $k$ and public key $K=k\cdot G$ for some elliptic-curve base-point $G$, we want that the $i^{th}$ private key generated by $k$ and the $i^{th}$ public-key generated by $K$ would match. In other words, $f(K,i) = f(k,i)\cdot G$. How can we construct $f$ to satisfy the property?
So, BIP-32, the standard for HD-wallets that was initially used by Bitcoin and later adopted by many other systems, solves to problem in an elegant manner. A public key $K$ can derive its $i^{th}$ subkey ($K[i]$) using (The “$||$” operator is concatenation): \(K[i] = K + h(K || i)\cdot G\)
Where $h $ is some hash function with sufficiently large output domain, so if we use the secp256k1 curve, like BIP-32, we need $h$ to yield at least 256-bits of pseudorandom output.
Now given a private key $k$ we can derive its $i^{th}$ subkey ($k[i]$) using: \(k[i] = k + h((k\cdot G) || i)\) Notice that since we know the private key $k$, we can compute the match public key $K=k\cdot G$ (used inside $h$). Now we can see that: \(\begin{align} k[i]\cdot G &= (k + h((k\cdot G) || i))\cdot G \\ &= k\cdot G + h((k\cdot G) || i)\cdot G \\ &= K + h(K || i)\cdot G \\ &= K[i] \end{align}\)
And we got the desired property that a private key and the matching public key will yield matching private keys and public keys.
It will, therefore, take you by no surprise the private key $k$ can derive its subkeys and also the subkeys of its matching public key $K$.
That’s it?
Well, as you might have noticed there is a subtle issue here which some could find as a security bug. Let’s see what is the difference between two consecutive private subkeys $k[1] $ and $k[2]$ for instance: \(\begin{align} k[2] - k[1] &= (k + h(K || 2)) - (k + h(K || 1) \\ &= h(K||2) - h(K||1) \end{align}\)
So, the difference between two consecutive private subkeys only depends on the public key, which isn’t supposed to be considered a secret! So if someone is given one of the subkeys $k[1]$ and the public key (which is, well … public!), then the next subkey can be computed using
\[k[2] = k[1] + h(K || 2) - h(K || 1)\]This, indeed, is a problem in some scenarios and therefore another notion of hardening was introduced. So far, the non-hardened HD-key-derivation was introduced, in which the difference between consecutive private/public keys can be computed using the public key alone. In the hardened version the $i^{th}$ private subkey of a private key $k$ is computed as: \(k[i] = k + h(k || i)\)
This makes the derivation more secure, in the sense that now the parent private key has to be compromised to compute the difference between consecutive private keys. However, given the matching public key of $K=k\cdot G$, one cannot compute the $K[i]$ in the hardened version, because $K[i] = K + h(k||i)\cdot G$ and the private key $k$ isn’t known to those who wish to derive only public keys!
In short hardened key derivation can only be used by knowing the parent private key. Therefore, in some scenarios, hardened key derivation isn’t applicable, for example, the touch screen in the coffee shop from the beginnig of the post.
Chain Codes
Ok, so we’re almost done, but there is one last caveat before you can tell everyone that you really know BIP-32. The thing is that we don’t want necessarily everyone to be able to derive keys. That is, if we use non-hardened key derivation, we can’t assume the public keys of the parents aren’t public, because these are public keys after all! So, in order to derive keys we may want the derivation to depend not only the public keys (in non-hardened mode) or the private keys (in hardened mode) but also on another source of entropy, which is known as chain codes. A little bit more formally, the derivation function $h$ will take not only the key $K/k$ and the index $i$ but also another input $c$, the chain code, so only those who know the chain code can derive the subkey. And to separate the chain code from the key/index we will use an HMAC function $h(k,M)$ instead of simply a hash function. We will also want that each key will be derived alongside a fresh chain code, therefore, we will use $h$ of double the output size (for example, for secp256k1 keys we will use $h$ with 512 bits of output), where the first half of the output (denoted $h_L$) will be used to derive the new chain code and the second half of the output (denoted $h_R$) will be used to derive the key. For a given key $K$ we will denote the $i^{th}$ generated chain code using $c_K[i]$.
So we get the following formulas:
Non Hardened | Hardened | |
---|---|---|
Public Key | \(\begin{aligned}K[i]&=K+h_R(c,K|i)\\ c_K[i] &= h_L(c,K|i)\end{aligned}\) | X |
Private Key | \(\begin{aligned}k[i]&=K+h_R(c,(k\cdot G)|i)\\ c_k[i] &= h_L(c,(k\cdot G)|i)\end{aligned}\) | \(\begin{aligned}k[i]&=k+h_R(c,k|i)\\ c_k[i] &= h_L(c,k|i)\end{aligned}\) |
One last note about chain-codes is that they are not mandatory and we can have done great without them. Either using them or not using them only depends on the security assumptions one it willing to make specifically about the secrecy of public-keys that can derive additional children.
That’s it! At this point you should pretty much know how BIP-32 is used and works. Notice that some technical aspects weren’t discussed:
- How do you hash an elliptic curve point (inside $h$).
- How numbers are encoded.
- Which hash/hmac function do to use.
For these and other technical details, please refer to BIP-32.