
I am continuing my series where I discuss security concepts that I believe developers should know and help them improve their security skills. My first article was on threat modeling.
This time, I will be discussing a couple of cryptography concepts that will be useful. I want to note that covering these two topics alone is a university course, so I wonโt be using the type of precision I would normally use in a more academic setting. However, my goal is to ensure that you have enough knowledge to use these concepts properly in a practical setting.
If you are going to get anything out of this newsletter, it should be that you shouldnโt build your own cryptography unless you have substantial expertise. Even as someone who did a PhD in this area, I donโt feel comfortable building my own cryptography without working with a group of experts who have done this before.
As I said above, I wonโt be able to cover these two topics comprehensively, but if youโre interested, you can read more about them in this free book by Dan Boneh and Victor Shoup. If you like a more structured course, I recommend this Coursera course. Itโs been around for a while, and the content is pretty good. (Fun fact: I was the TA for the actual Stanford course that ran parallel to the first iteration of this course!)
Anyway, letโs get started.
Hashing
The core concept in hashing is a hash function. A hash function takes any size input and outputs a fixed-sized value. For a hash function to be useful in a security sense, it has to be cryptographically secure. What does that mean? It means it has to have the following properties for a hash function H:
Given an output y where y = H(x), it should be hard to find x such that H(x) = y. Itโs important to note here since H can take arbitrary-sized input, but the output length is fixed, there are multiple values of x such that H(x) = y for any y. This is a basic application of the pigeonhole principle. This is important for the next property.
Given a value x, it should be hard to find a value y, such that H(x) = H(y).
Finally, it should be hard to find two values x and y, such that H(x) = H(y). This is known as collision resistance.
Thereโs an important correctness property that ties this all together. H has to be deterministic.
Finally, one last observation is that hash functions are irreversible, meaning that once you apply the hash function to a value x, such that H(x) = y, thereโs no way to recover x. This is because the value is โcompressedโ since H takes an arbitrary value and outputs a fixed value.
Some examples of cryptographically secure hash functions are SHA-2 and SHA-3. Those are typically the core building blocks of other hash functions.
To wrap this up, I loosely used the word โhardโ above. What does โhardโ mean? In practice, this would mean that with reasonable amounts of computation and memory, an attacker wouldnโt be able to do this. Itโs insufficient to be computationally intensive because he/she can always pre-compute values. Common attacks that use this are called rainbow table attacks.
Use Cases
What are some use cases of hash functions? We find hash functions in many places in software. One common usage is to generate seemingly random values called pseudorandom values. These values are indistinguishable from random values that you can otherwise get from /dev/urandom, which takes time to generate. All you need to do is โseedโ a hash function, and you can get an infinite number of seemingly random values.
Another common usage is to verify integrity. You hash a file and send the file and hash. Once someone receives the file, he/she can hash it, and if the hashes match, you know that it hasnโt been modified. This is commonly used when downloading files or with Javascript libraries.
Another use case is for password verification. Itโs risky to store passwords in plaintext, so they are hashed in databases. This way, an application can check the password without knowing the password itself.
Finally, cryptocurrencies use hash functions as proof of work. They ask miners to find x given H(x), which requires them to iterate through values of x until they find one that matches H(x).
Encryption
Another cryptography concept commonly used in software is encryption. This is different from hashing. The main purposes of hashing are integrity and verification, i.e. data hasnโt been modified or it is what you say it is. The purpose of encryption is for confidentiality, which is meant to hide data. Therefore, itโs possible to use both encryption and hashing together.
To explain encryption, I will discuss symmetric key encryption for simplicity, which means there is one key. Itโs the responsibility of the parties involved to ensure the key is securely distributed. I will discuss public key encryption in a different newsletter, but many of the core algorithms and concepts are the same.
Unlike hashing, encryption has three main functions:
key generation
encryption
decryption
In the key generation function, a random key is generated. Typically, we use a pseudorandom function to generate this. (This is an application of the hashing function above.) We wonโt discuss how to distribute this key securely, so we will assume the relevant parties receive this key. If youโre interested, you can read about key exchanges. A common one is the Diffie-Hellman key exchange, and itโs the foundation for key exchanges, such as TLS.
The encryption function takes as input, a key and a message, and it outputs a ciphertext. The decryption function takes as input, a key and a ciphertext, and outputs the original message.
A few observations:
For an encryption algorithm to be considered correct, given an encryption function E and decryption function D with a key k, D(k, E(k,m)) = m, where m is an arbitrary message.
Everything is public except the key. To maintain security, the key must be kept private. If the key is leaked, there are no longer guarantees of security.
The ciphertext should be indistinguishable from a random string. This way, thereโs no way to learn any information about the original message from the ciphertext alone.
Finally, unlike hash functions, the output of the encryption function, i.e. the ciphertext, has to be at least as long as the original message. Otherwise, there would be no way to recover the original message with the decryption function.
The most common encryption functions use the AES cipher block as the foundation. There are only two approved cipher blocks for encryption: AES and triple-DES.
Encryption has one broad use case: to hide information with the ability to recover it given the proper secret. Itโs important to note that a key difference between encryption and hashing is that with encryption, itโs possible to recover the original message/input.
I hope this was a helpful overview of two key cryptographic concepts that are widely used. If there are other topics people believe I should cover, please feel free to let me know!