#### Summary

* Zero-knowledge proofs are a powerful cryptographic technique at the heart of self-sovereign identity (SSI). This post should help you understand what they are and how they can be used.*

Suppose Peggy needs to prove to Victor that she is in possession of a secret without revealing the secret. Can she do so in a way that convinces Victor that she really does know the secret? This is the question at the heart of one of the most powerful cryptographic processes we can employ in identity systems: *zero-knowledge proofs (ZKPs)*. Suppose for example that Peggy has a digital driver's license and wants to prove to Victor, the bartender, that she's over 21 without just handing over her driver's license or even showing him her birthdate. ZKPs allow Peggy to prove her driver's license says she's at least 21 in a way that convinces Victor without Peggy having to reveal anything else (i.e., there's zero *excess* knowledge).

This problem was first explored by MIT researchers Shafi Goldwasser, Silvio Micali and Charles Rackoff in the 1980s as a way of combatting information leakage. The goal is to reduce the amount of *extra* information the verifier, Victor, can learn about the prover, Peggy.

One way to understand how ZKPs work is the story of the Cave of Alibaba, first published by cryptographers Quisquater, Guillou, and Berson^{1}. The following diagram provides an illustration.

The Cave of Alibaba has two passages, labeled A and B, that split off a single passageway connected to the entrance. Peggy possesses a secret code that allows her to unlock a door connecting A and B. Victor wants to buy the code but won't pay until he's sure Peggy knows it. Peggy won't share it with Victor until he pays.

The algorithm for Peggy proving she knows the code proceeds as follows:

- Victor stands outside the cave while Peggy enters and selects one of the passages. Victor is not allowed to see which path Peggy takes.
- Victor enters the cave and calls out "A" or "B" at random.
- Peggy emerges from the correct passageway because she can easily unlock the door regardless of which choice she made when entering.
- Of course, Peggy could have just gotten lucky and guessed right, so Peggy and Victor repeat the experiment many times.

If Peggy can always come back by whichever passageway Victor selects, then there is an increasing probability that Peggy really knows the code. After 20 tries, there's less than one chance in a million that Peggy is simply guessing which letter Victor will call. This constitutes a *probabilistic proof* that Peggy knows the secret.

This algorithm not only allows Peggy to convince Victor she knows the code, but it does it in a way that ensures Victor can't convince anyone else Peggy knows the code. Suppose Victor records the entire transaction. The only thing an observer sees is Victor calling out letters and Peggy emerging from the right tunnel. The observer can't be sure Victor and Peggy didn't agree on a sequence of letters in advance to fool observers. Note that this property relies on the algorithm using a good pseudo-random number generator with a high-entropy seed so that Peggy and third-party observers can't predict Victor's choices.

Thus, while Peggy cannot deny to Victor that she knows the secret, she can deny that she knows the secret to other third parties. This ensures that anything she proves to Victor stays between them and Victor cannot leak it—at least in a cryptographic way that proves it came from Peggy. Peggy retains control of both her secret and the fact that she knows it.

When we say "zero knowledge" and talk about Victor learning nothing beyond the proposition in question, that's not perfectly true. In the cave of Alibaba, Peggy proves in zero knowledge that she knows the secret. But there are many other things that Victor learns about Peggy that ZKPs can do nothing about. For example, Victor knows that Peggy can hear him, speaks his language, walk, and is cooperative. He also might learn things about the cave, like approximately how long it takes to unlock the door. Peggy learns similar things about Victor. So, the reality is that the proof is *approximately* zero knowledge not *perfectly* zero knowledge.

## ZKP Systems

The example of Alibaba's Cave is a very specific use of ZKPs, what's called a *zero-knowledge proof of knowledge*. Peggy is proving she knows (or possesses something). More generally, Peggy might want to prove many facts to Victor. These could include propositional phrases or even values. ZKPs can do that as well.

To understand how we can prove propositions in zero knowledge, consider a different example, sometimes called the Socialist Millionaire Problem. Suppose Peggy and Victor want to know if they're being paid a fair wage. Specifically, they want to know whether they are paid the same amount, but don't want to disclose their specific hourly rate to each other or even a trusted third party. In this instance, Peggy isn't proving she knows a secret, rather, she's proving an equality (or inequality) proposition.

For simplicity, assume that Peggy and Victor are being paid one of $10, $20, $30, or $40 per hour. The algorithm works like this:

- Peggy buys four lock boxes and labels them $10, $20, $30, and $40.
- She throws away the keys to every box except the one labeled with her wage.
- Peggy gives all the locked boxes to Victor who privately puts a slip of paper with a "+" into the slot at the top of the box labeled with his salary. He puts a slip with a "-" in all the other boxes.
- Victor gives the boxes back to Peggy who uses her key in private to open the box with her salary on it.
- If she finds a "+" then they make the same amount. Otherwise, they make a different amount. She can use this to prove the fact to Victor.

This is called an *oblivious transfer* and proves the proposition `VictorSalary = PeggySalary`

true or false in zero knowledge (i.e., without revealing any other information).

For this to work, Peggy and Victor must trust that the other will be forthcoming and state their real salary. Victor needs to trust that Peggy will throw away the three other keys. Peggy must trust that Victor will put only one slip with a "+" on it in the boxes.

Just like digital certificates need a PKI to establish confidence beyond what would be possible with self-issued certificates alone, ZKPs are more powerful in a system that allows Peggy and Victor to prove facts from things others say about them, not just what they say about themselves. For example, rather than Peggy and Victor self-asserting their salary, suppose they could rely on a signed document from the HR department in making their assertion so that both know that the other is stating their true salary. Verifiable Credentials provide a system for using ZKPs to prove many different facts alone or in concert, in ways that give confidence in the method and trust in the data.

## Non-Interactive ZKPs

In the previous examples, Peggy was able to prove things to Victor through a series of interactions. For ZKPs to be practical, interactions between the prover and the verifier should be minimal. Fortunately, a technique called SNARK allows for non-interactive zero knowledge proofs.

SNARKs have the following properties (from whence they derive their name):

- Succinct: the sizes of the messages are small compared to the length of the actual proof.
- Non-interactive: other than some setup, the prover sends only one message to the verifier.
- ARguments: this is really an argument that something is correct, not a proof as we understand it mathematically. Specifically, the prover theoretically could prove false statements given enough computational power. So, SNARKs are "computationally sound" rather than "perfectly sound".
- of Knowledge: the prover knows the fact in question.

You'll typically see "zk" (for zero-knowledge) tacked on the front to indicate that during this process, the verifier learns nothing other than the facts being proved.

The mathematics underlying zkSNARKs involves homomorphic computation over high-degree polynomials. But we can understand how zkSNARKs work without knowing the underlying mathematics that ensures that they're sound. If you'd like more details of the mathematics, I recommend Christian Reitwiessner's "zkSNARKs in a Nutshell".

As a simple example, suppose Victor is given a `sha256`

hash, `H`

, of some value. Peggy wants to prove that she knows a value `s`

such that `sha265(s) == H`

without revealing `s`

to Victor. We can define a function `C`

that captures the relationship:

C(x, w) = ( sha256(w) == x )

So, `C(H, s) == true`

, while other values for `w`

will return `false`

.

Computing a zkSNARK requires three functions `G`

, `P`

, and `V`

. `G`

is the key generator that takes a secret parameter called `lambda`

and the function `C`

and generates two public keys, the proving key `pk`

and the verification key `vk`

. They need only be generated once for a given function `C`

. The parameter `lambda`

must be destroyed after this step since it is not needed again and anyone who has it can generate *fake* proofs.

The prover function `P`

takes as input the proving key `pk`

, a public input `x`

, and a private (secret) witness `w`

. The result of executing `P(pk,x,w)`

is a proof, `prf`

, that the prover knows a value for `w`

that satisfies `C`

.

The verifier function `V`

computes `V(vk, x, prf)`

which is true if the proof `prf`

is correct and false otherwise.

Returning to Peggy and Victor, Victor chooses a function `C`

representing what he wants Peggy to prove, creates a random number `lambda`

, and runs `G`

to generate the proving and verification keys:

(pk, vk) = G(C, lambda)

Peggy must not learn the value of `lambda`

. Victor shares `C`

, `pk`

, and `vk`

with Peggy.

Peggy wants to prove she knows the value `s`

that satisfies `C`

for `x = H`

. She runs the proving function `P`

using these values as inputs:

prf = P(pk, H, s)

Peggy presents the proof `prf`

to Victor who runs the verification function:

V(vk, H, prf)

If the result is true, then the Victor can be assured that Peggy knows the value `s`

.

The function `C`

does not need to be limited to a hash as we did in this example. Within limits of the underlying mathematics, `C`

can be quite complicated and involve any number of values that Victor would like Peggy to prove, all at one time.

### Notes

- Quisquater, Jean-Jacques; Guillou, Louis C.; Berson, Thomas A. (1990). How to Explain Zero-Knowledge Protocols to Your Children (PDF). Advances in Cryptology – CRYPTO '89: Proceedings. Lecture Notes in Computer Science. 435. pp. 628–631. doi:10.1007/0-387-34805-0_60. ISBN 978-0-387-97317-3.

Photo Credit: Under 25? Please be prepared to show proof of age when buying alcohol from Gordon Joly (CC BY-SA 2.0)