Why Are Reference Strings So Common?

Introduction

In this post I assume the reader is familiar with the term NIZK (Non-Interactive Zero-Knowledge), what people these days also call “zk-SNARKs”. I’m by all means not an expert in ZK (Zero Knowledge) and as so you are not expected to be familiar with all the low level technical details used to construct such argument systems. If you’ve never heard about NIZK/SNARKs check out Wikipedia’s page about it until maybe some day I’ll write something about it myself!

I’ll also be assuming some familiarity with basic concepts from complexity theory, but if there’s anything on this part that is left uncovered, feel free to contact (see the contact page).

Ok, so you’ve probably heard something about SNARKs in the past and have heard about different projects which incorporate some sort of a Non-Interactive ZK to establish some application, typically a distributed ledger. Some of these projects include: StarkWare, zk-Sync, ZCash, Mina, Dark Forest, Aleo and many others.

These projects rely on what’s called a proof system (or better yet, an argument system). This system is typically the core foundation of the cryptography enabling these projects. In short, these systems are the NIZK/SNARKs and in the past few years a myriad of such systems were published, pushing the boundaries of efficiency and security one step at a time. Just to name a few of these systems, we have Gorth16, PLONKs, Bulletproofs, STARKs, Ligero, Halo, Marlin, Sonic and probably by the time I end writing this post at least one more will be published.

Setups, setups everywhere…

Now if you ever get into any of these projects’ documentations you’ll find (or at least I really hope you’ll find) some information containing the term “setup”, typically accompanied by the term “trusted” or “trustless”. This term is used to convey the message that any of these projects was initially bootstrapped by a process (a setup) which is either “trustless” or “trusted”. In the “trusted” flavor a user will have to believe that the parties who took part in such setup behaved honestly while in the “trustless” flavor no such belief is required. The result of this setup (either trusted or trustless) is what is typically called a CRS (Common Reference String). This is a public string containing some parameters that are later used by the argument system. Just as an example the Groth16 (paper link) system trusted setup yields what’s known as the “Powers of Tau”, since the CRS simply contains a set of powers of some group element, denoted with $ \tau $ (the greek letter Tau).

Some of you might have asked yourselves, “Why is a trusted setup needed?”. You probably won’t be surprised to find the same question on stackexchange. While the answers to this question may have their merit (I’m not here to criticize any of them), they typically revolve around why a trusted setup is used, compared to a trustless setup. None of these answers tell why is the setup needed at all! This is also portrayed well in Vialik Buterin’s great post about trusted setups. Let me quote the beginning of his post:

Many cryptographic protocols, especially in the areas of data availability sampling and ZK-SNARKs depend on trusted setups. A trusted setup ceremony is a procedure that is done once to generate a piece of data that must then be used every time some cryptographic protocol is run.

Why is this piece of data must be used? After all, even these trustless setup systems, still have to run some setup procedure and this setup typically makes at least some cryptographic assumtions. For example, the STARK CRS is secure only by assuming CRH (collision resistant hashing). Now I’m obviously not here to disprove the CRH assumption, but if there’s something more secure than having a trustless setup is having no setup at all, right? If so, why haven’t we seen “setup-less” NIZK systems?

The answer to this question is going to be the topic of this post. Let’s begin!

A History Class for a Complexity Class

The notion of interactive proof (IP) systems followed by the notion of Zero-Knowledge proofs was established in the mid-1980s in the seminal work by Goldwasser, Micali and Rackoff from the [GMR] paper (Title: “The Knowledge Complexity Of Interactive Proof Systems”) in which they show how a proof system may possess the property of zero-knowledge. The main idea in the paper is that if a prover $ P $ wants to prove something to a verifier $ V $ usually in the process of proving some infomation could leak. Let’s have an example. Say we want to design a proof system to show that some number $ n $ is composite (i.e. it isn’t prime). The most naive way to do so would be having the prover send some number $ k $ to the verifier who will check that the given number $ k $ really divides $ n $. This is great, but notice that the verifier not only learned that $ n $ is composite, but also learned that $ k $ is a divisor of $ n $.

So, in their paper they have defined a new complexity class, denoted $ KC(0) $, which stands for “Knowledge-Complexity 0” that became later known as zero-knowledge as we know it today.

It didn’t take long until it was proven that all problems in $ NP $ have a zero-knowledge proof systems in the amazing paper by Goldreich, Micali and Wigderson link, also known as GMW. (Title: Proofs that Yield Nothing But Their Validity or All Languages in NP Have Zero-Knowledge Proof Systems).

As a side note for those who don’t know about $ NP $, think of it as the set of problems for which a prover can give a proof which isn’t too long and that can be efficiently verified by the verifier.

While this sounds really exciting, the constructed proof-system for all problems in $ NP $ was still interactive, but we’re interested in non-interactive zero-knowledge (NIZK) proof systems! So, a short time after GMW, Blum, Feldman and Micali published a paper about NIZK (Title: Non-interactive zero-knowledge and its applications). In their paper they show how any language in $ NP $ can have a NIZK proof-system if some short random string is shared between the prover and the verifier, this is the common reference string (CRS)!

Some Bad News

But still, the fact that some construction used a CRS doesn’t mean it is mandatory, right? Yes, and this is exactly what led Goldreich and Oren to publish in the mid 1990s their paper dealing exactly with this question (Title: Definitions and Properties of Zero-Knowledge Proof Systems). In their paper they prove that any NIZK not using a CRS is limited to proofs for problems in $BPP$.

Roughly spekaing, $ BPP $ is the set of problems having efficient randomized algorithms to decide them with good probability.

Unless something very unexpected happens in the future, we believe that $ BPP $ isn’t a very “Big” set of interesting problems. More concretely, when using your favorite SNARK, you create a proof that a certain circuit is satisfiable. The circuit satisfiability problem is in $ NP $ and unless some very unexpected news come from complexity theory guys, this problem isn’t in $ BPP $. So, according to Goldreich and Oren to create a NIZK for circuit satisfiability we must have a CRS shared between the prover and the verifier. In the rest of the post we’ll try to prove it, because I think the idea behind the proof is quite simple and worth understanding.

But what is Zero-Knowledge?

Before proving let’s recall what Zero-Knowledge really means. Conceptually, Zero-Knowledge means that no information is leaked in the process of proving. If no information is leaked, then obviously the verifier can generate a similar proof by itself, without any access to the proof, right? Because if it can do this then it learned nothing from the proof and if it can’t do this then there’s something in the proof itself that is helping the verifier create other “similar” proofs. In other words the verifier can “simulate” the process of proving by itself. More formally, given a prover $ P $ who wants to prove some statement $ x $ to a verifier $ V $, we say that the proof system is Zero-Knowledge if there exists a randomized algorithm $ S $ which we call a simulator which on input $ x $ yields an output denoted $ S(x) $ that can be interpreted as a simulation of the interaction between the prover and the verifier. The simulator also has to be efficient. This definition is not 100% accurate, but let’s leave it like that as it grasps the core notion of the full definition. The important thing to pay attention to is the fact that the simulator can create such transcripts without any access to a proof for the validity of the statement $ x $! In any case, we say that the proof system is Zero-Knowledge if there’s no (efficient) way, given a string $ s $ to tell whether it is the output of such simulator $ S(x) $ or an original proof transcript created by the prover $ P $ and the verifier $ V $. This bascially means that if someone else gives you a transcript from a ZK-proof interaction between a prover and a verifier, you can’t just believe him! This transcript could have been faked using such a simulator!

The Proof

The idea of the proof by Goldreich and Oren goes like this: If a NIZK proof-system exists (and we know it does) it also has a simulator $ S(x) $. The output of the simulator should look just like the interaction between the prover and the verifier. However, since this is non-interactive proof system, the whole transcript is just a single message sent from the prover to the verifier. So, the simulator should be able to create such transcripts which are composed only by a single message from the prover to the verifier without being given a valid proof. But, if it does so, doesn’t it mean that the simulator actually is able to create valid proofs for $ x $? After all, the whole transcript is just the proof itself, unlike transcripts for interactive proof systems, and if a randomized, efficient algorithm can create proofs for it, then the problem for which we have created the proof system is in $ BPP $.

That’s pretty much it, without getting into the mathematical notation and rigorous definitions used by Goldreich and Oren.

Interactivity is Meaningful

As you can see, this “loophole” exploited in the proof can only exist in the case of non-interactive proof systems, where the transcript of the simulator is also a valid proof! This all means that being interactive gives a lot of power to such interactive zero knowledge proof systems and only because this interactivity lacks in NIZK we need some extra information to create a simulator. Typically the CRS contains information referred to as “simulation trapdoor”, because it gives the simulator some degrees of freedom on deciding the contents of the CRS to allow it to create such proofs without yielding valid proofs, like what happened in the proof when we didn’t have the CRS.

That’s all for this time!