Since our first post, Why We Like KERI described our overall fondness for KERI (Key Event Receipt Infrastructure) as the Distributed Public Key Infrastructure (DPKI) scheme of the future, this is the first in a group of posts that we hope will shed some light on various pieces of KERI. We thought we’d start with everyone’s favorite piece of computing, encoding schemes!!
😴😴😴
The author hated encodings in his school days and still gets sleepy sometimes working with them even though he’s spent most of his career dealing with the weird quirks and oddities of various encodings as they exist in the dark bowels of the Internet. Machine Learning at scale isn't always beautiful or elegant. Usually the opposite.
Strangely enough, his career has primarily focused on, and extensively required the use of a lot of other things he used to find boring in school like Databases, Statistics, and ETL Pipelines. But that's life, messy details are what separate the professional from the scrubs.
However, we will attempt to keep this post to a high enough level so that you don’t have to take a quick afternoon nap halfway through. So lets talk about, Why we focus on encodings in the KERI community and why you should too.
There are three main assertions that this series should cover and if you want to stop right there, feel free:
In a future of digital identity that operates in a decentralized manner (or perhaps even a centralized one) there will be FAR MORE CRYPTOGRAPHY on the wire and on disk than today. This is the only pathway to a secure and verifiable future. We call this assertion, There will be a lot of cryptography in the future.
Most modern encodings and serialization formats of today were designed without cryptography in mind and while they are okay in a web based system, they aren’t quite up to the task of dealing with this increase from 1). Our second point then is, using encoding and serialization formats that exist today in your digital identity systems will cost you time and money in that future as your business and operational needs scale.
Creating an encoding scheme that’s “cryptography first” will be a necessary competitive advantage in order to drive down costs (in both computing time and money) at the scale required for a safe, verifiable, Internet. We call this thesis, CESR is will save you time and money (CESR is the name of the encoding that KERI and the ToIP’s Trust Spanning Protocol use but we’ll get to that).
For this first post we’re mainly going to focus on 1) with 2) and 3) to follow in future posts. We’ll start with a short history of the spread and adoption of cryptographic material in human communications. Move on to a wonderful document from the W3C on various “DID use cases”, and then consider a thought experiment about how much cryptographic material might be generated (a lower bound estimate), if those use cases come about which will demonstrate how much cryptographic material there might be in the future.
Abridged History of Cryptography
Apologies in advance, like most concise stories about the past, this one will be mostly wrong and simply one narrative of how history unfolded. However, it will remain objectively true enough on the average that it will still stand in as useful background for this post. For a better history I liked this book although it was published in 2018 so it will not contain everything post-Snowden.
Ancient times: ciphers invented that can now be broken by school children. A variety of physical devices serve similar functions in commerce to the cryptography we use today. Probably less than a thousand people on Earth actually use these tools in any kind of regular manner but hiding information with codes is as old as writing from what we can tell.
Modern times: ciphers created by machines start coming to the fore by the late 19th century. These ciphers are broken by other machines and the burgeoning field of probability and statistics. Still largely a domain of the military and political establishments.
Modern cryptanalysis and cryptography are born out of that work and soon we have the underpinnings of much of the modern cryptography we know today.
Governments try to keep this stuff to themselves but mathematics flourishes and global commerce and the Internet inevitably demand protection of commerce via secret codes. In a globalized, interconnected, digital world its too easy to make “fake” data.
Snowden lets us all know that all the governments of the world are spying on all of us all the time, even the democracies. Especially the democracies. This makes some people uncomfortable. Most of the large corporations react to public pressure and start encrypting everything they can. Maybe its successful, only those in Fort Meade and a few other locations worldwide really know.
Today: nearly all Internet transport is now encrypted. What we put on the wire is assumed to be mostly safe from prying eyes.
Here’s a snapshot of the graph of pages loaded over HTTPS in Chrome taken from the Google transparency project:
https://transparencyreport.google.com/https/overview?hl=en
This particular graph only goes back to 2019 but we can clearly see that for web traffic some kind of critical mass of data transport is currently being encrypted on the web. We’ve gone from less than half of all web traffic being loaded over HTTPS to near 100% of it being encrypted within the span of a decade. A similar story can be told for data encrypted at rest in many large enterprises (but not all of them sadly enough).
This is just the beginning. As the policy and technological choices behind many of the digital identity movements that exist in the world today expand, more and more things will need to be encrypted, or rather, have cryptographic material attached in order to make things safe, attributable, private, secure, etc... The frontline solution to many of the problems that plague Internet security today be the tools of cryptography. What does that mean exactly?
Salts, ciphers, hashes, digests, proofs, keys. These and other schemes are the building blocks of anyone doing any authentication, verification, non-repudiation, secret sharing, hiding, securing to fix these glaring problems and issues that are coming up. Whether they be privacy advocates who want to hide from all governments or governments who want to authenticate all citizens. They’re all using the same tools, but differently. Almost none of them except the most authoritarian states that rely on surveillance (for various reasons) want plaintext information transmitting over the wire (a turn of phrase. Wireless, infrared, carrier pigeon, any transport you like to imagine).
Today’s use cases are segregated, only use a few pieces of cryptographic know-how based on their particular intended use case and call it a day. If we want to encrypt data at rest we generate one key and encrypt the data at rest. Hopefully that key is secure. If we want to encrypt data in transport, we create a key used for that session and a website operator obtains a signed certificate to authenticate the other side. We want to encrypt proof that we hold some data, we pass one key and a proof to show that we hold that data. Its all disconnected and none of it interoperates. As you probably know if you’ve ever even heard of systems integration projects within your business or workplace.
In the world of digital identity its typically the same way, for example, suppose I have a brand new mobile drivers license (mDL).
I don’t actually, North Carolina is so far behind that curve its sad.
This mDL is signed by an authoritative key, it calls home to make sure that authoritative key or this particular mDL hasn’t been revoked, it can present proofs of issue, and even sign things. I have an identity credential but only a few pieces of cryptographic material.
This use case is straightforward, we’re using the old fashioned version of a credential in the model of “one super strong identity to rule them all” and I don’t need much cryptography to make that work.
However, the prediction in this post (and it doesn’t take a rocket scientist to see this) is that in a people first, digital identity system, this is going to look like a very trivial use case. Especially in the decentralized, futuristic use cases, that will soon exist.
A great document of use cases
To consider specifics of this future lets review a wonderful document on DID use cases published by the W3C DID working group a while ago. If you don’t know, a DID stands for decentralized identifier, a very important piece of a future decentralized and verifiable Internet. We won’t go into DIDs here in depth, but this video is a particularly good introduction to them if you’re interested in the details. For this blog post, just know that they’re exactly what they sound like. A decentralized identifier, typically coupled with something called a “DIDdoc” that binds that identifier to some type of cryptographically verifiable information.
KERI supports DIDs via two methods, a did:keri method that acts as a decentralized identifier for KERI based identities called Autonomic Identifiers (AIDs) and did:webs which is a did method that binds a KERI based identifier to a traditional web domain name in a secure and verifiable manner. More on these in later posts.
When DIDs were being created, one of the most persuasive documents the DID working group formulated was the Use Cases and Requirements for Decentralized Identifiers document I linked above.
It contains a set of 12 use cases, some features and benefits that DIDs can provide to applications similar to those 12 use cases, some DID properties necessary to meet those use cases, and finally some “focal use cases” to really drive home various domains that people want to apply DIDs to. We won’t go into everything in this document in detail other but we think anyone in digital identity should consider this document canonical in the space.
Cryptographic Material Thought Experiment
So instead of going into the details of this document, what we’re going to do use this document to imagine that all of the use cases envisioned in this document actually come true in the future. DIDs (or KERI identifiers, or any kind of digital identity scheme) gain sway and the People rejoice. Huzzah!
While we’re imagining that bright and shining utopia, we’ll just add up all the cryptographic material we might need along the way and that might be a good lower bound estimate on all the cryptographic data we might have to deal with in the future.
We’ll start with the online shopper use case (#1). From earlier in the post you’ll remember that we already have our mDL. No more physical driver’s licenses for us! Everyone in the future possesses this credential. So we have a ready made ID that can be used with things like Amazon, Walmart, and www.frenchbroadchocolates.com. These websites don’t necessarily want to know all our PII, they mostly just want a pseudonymous secure identifier that they can use to track purchases and order flow, handle secure communication, and maybe collect statistics and behavioral profiles to better serve us ads on the site.
The mDL could be used as a bare identifier but probably for security’s sake Amazon will produce some digest, hash, proof of an mDL so that they don’t have the security vulnerability of getting hacked PII. That digest, hash, or proof will be our first piece of cryptographic material.
For similar reasons secure communication between Amazon and us will require a key symmetric or asymmetric but maybe this is just a TLS key signed by our mDL. Maybe this is just the second piece of cryptographic material.
We move to the second use case and buy a car from Amazon. That car comes with a DID and that DID is then connected to all of the components of that car via their DIDs. Assuming each of these DIDs just has one piece of cryptographic material with which to verify each component now we’ve got a car’s worth of cryptographic material!!!
I don’t know how many components are in a car so I asked ChatGPT. It claims that based on the make and model a modern car has about 30,000 individual components. I’m no car guy but seems within at least an order of magnitude of what I’d guess so we’ll go with that. Now we have 30,000 + 3 pieces of cryptographic material.
From there we move on to the “Master Data of Entities” database in use case 4 that’s particularly relevant to cars, the DMV. That DID of the car will be associated with the DID of the mDL. There’s another piece of cryptographic material.
We join a car sharing service (use case 5) and the car’s DID, plus an identifier mDL, plus a bank account DID so that we can be paid is associated with the sharing service. Four more pieces of cryptographic material.
We borrow a car because that one we just bought happened to be broken almost immediately and we receive an invoice for sharing. We’re freelance journalists so we submit this invoice alongside our hours to the company we work for using (once again) our mDL, our bank account id, and we cryptographically “sign” the invoice and bank statements (alongside the bank’s and share service cryptographic signatures) leading to three more pieces of cryptographic material to make this assertion.
Our original purchased car broke because one of the components was faulty. The supply chain had pseudonymous supply chain identifiers so the company could track down what was happening. So for each of 30,000 components, the car company can track back through n-steps of a supply chain to see where the component was sourced from, and either decide to let them know, write it off, or cut them out for being too low of quality. Each of these steps will have at least one piece of cryptographic material.
Finally, we’ll use an e-receipt “correlation controlled service” to prove to our credit company that we’re using the auto loan they provided to buy the car to actually buy a car and as a way of lowering our interest payments (as they can now see our cash flow without invading our privacy). For each receipt we’re issuing a piece of cryptographic material to verify the receipts.
What’s the point?
The point is just to demonstrate in a simple manner, that using a few imagined use cases from one document about future developments, over the course of a day or two, we’ve connected our future selves to a graph of cryptographic material that numbers at least tens of thousands of items (datums if you will) of cryptographic material someone is going to have to transmit and store in their clouds, computers, or mobile devices. We’ve done so while really only having an mDL, a few services, and a car to contend with. As we imagine someone going through their day to day life collecting digital identities of all kinds of services and “things” they might interact with this material is only going to accumulate. This is just on the consumer side, for the operators of the DMV, Amazon, or other large concern, these data imaginings are going to come to data realities. Harddrive vendors are not going out of business anytime soon.
There’s going to be a lot of this stuff. So how are we currently envisioning writing this data to those hard drives? I’ll show you. Here’s a did doc with a few methods and two pieces of cryptographic material (a json web signature and a public key) attached.
{
"@context": [
"https://www.w3.org/ns/did/v1"
],
"id": "did:example:123456789abcdefghi",
"verificationMethod": [
{
"id": "did:example:123456789abcdefghi#keys-1",
"type": "JsonWebKey2020",
"controller": "did:example:123456789abcdefghi",
"publicKeyJwk": {
"kty": "EC",
"crv": "P-256",
"x": "f83OJ3D2xF4QqcoGk-3RL_lUw7d3vfbX0vD8HhGqsBA",
"y": "x_FEzRu9MQF0ICp7fV19bc5R4uIXyPDY_djT7F5iNUw"
}
}
],
"authentication": [
"did:example:123456789abcdefghi#keys-1"
],
"assertionMethod": [
"did:example:123456789abcdefghi#keys-1"
],
"proof": {
"type": "JsonWebSignature2020",
"created": "2023-06-24T19:23:24Z",
"proofPurpose": "assertionMethod",
"verificationMethod": "did:example:123456789abcdefghi#keys-1",
"jws": "eyJhbGciOiJFUzI1NiIsImtpZCI6ImRpZDpleGFtcGxlOjEyMzQ1Njc4OWFiY2RlZmdoaSNrZXlzLTEifQ..ZXV1ZJIzjc9TQ-8ZrFeGhtUjKN-s9PVXw77Dh-dqZT0HQ"
}
}
Now imagine 30,000 of these. Imagine this but with cross signing (that’s 30,000 * 2). Imagine 30,000 of these with cross signing and proving that multiple conditions hold at once. Imagine a supply chain worth of these things from all the entities that touch a component. The scale of what we’re going to create is immense and will be, at the least, super-linear in growth. This is going to be a big problem. No matter how fast your computer is it can only read so many bits per second. No matter how much storage you’ve got, this kind of interdependency can eat up bits if you’re not careful and at scale you can be nickeled and dimed right into bankruptcy. We’ll expand upon the unsuitableness of this example and other suggested formats in our next post.
Conclusion
We introduced this series on CESR and why we in the KERI community focus on encodings, particularly encodings that are cryptography first encodings, we went into a short history of cryptography and how its use has been accelerating in the past, and then a short walk through some use cases from the wonderful “DID Use Cases” document put together by the W3C.
In our next post we’ll cover the various encoding schemes that have been imagined so far and why they’ll fall short in this imagined future. We hope that this post though has convinced you that if even some of the things imagined in digital identities future comes to pass there will be quite a lot of cryptographic material to deal with in the future. Till next time.