Security for developers: Hashing and Encryption

Key cryptography concepts

Apr 10, 2024

I am continuing my series where I discuss security concepts that I believe developers should know and help them improve their security skills. My first article was on threat modeling.

Security for developers: Threat Modeling

Frank Wang

March 26, 2024

Read full story

This time, I will be discussing a couple of cryptography concepts that will be useful. I want to note that covering these two topics alone is a university course, so I won’t be using the type of precision I would normally use in a more academic setting. However, my goal is to ensure that you have enough knowledge to use these concepts properly in a practical setting.

If you are going to get anything out of this newsletter, it should be that you shouldn’t build your own cryptography unless you have substantial expertise. Even as someone who did a PhD in this area, I don’t feel comfortable building my own cryptography without working with a group of experts who have done this before.

As I said above, I won’t be able to cover these two topics comprehensively, but if you’re interested, you can read more about them in this free book by Dan Boneh and Victor Shoup. If you like a more structured course, I recommend this Coursera course. It’s been around for a while, and the content is pretty good. (Fun fact: I was the TA for the actual Stanford course that ran parallel to the first iteration of this course!)

Anyway, let’s get started.

Hashing

The core concept in hashing is a hash function. A hash function takes any size input and outputs a fixed-sized value. For a hash function to be useful in a security sense, it has to be cryptographically secure. What does that mean? It means it has to have the following properties for a hash function H:

Given an output y where y = H(x), it should be hard to find x such that H(x) = y. It’s important to note here since H can take arbitrary-sized input, but the output length is fixed, there are multiple values of x such that H(x) = y for any y. This is a basic application of the pigeonhole principle. This is important for the next property.
Given a value x, it should be hard to find a value y, such that H(x) = H(y).
Finally, it should be hard to find two values x and y, such that H(x) = H(y). This is known as collision resistance.

There’s an important correctness property that ties this all together. H has to be deterministic.

Finally, one last observation is that hash functions are irreversible, meaning that once you apply the hash function to a value x, such that H(x) = y, there’s no way to recover x. This is because the value is “compressed” since H takes an arbitrary value and outputs a fixed value.

Some examples of cryptographically secure hash functions are SHA-2 and SHA-3. Those are typically the core building blocks of other hash functions.

To wrap this up, I loosely used the word “hard” above. What does “hard” mean? In practice, this would mean that with reasonable amounts of computation and memory, an attacker wouldn’t be able to do this. It’s insufficient to be computationally intensive because he/she can always pre-compute values. Common attacks that use this are called rainbow table attacks.

Use Cases

What are some use cases of hash functions? We find hash functions in many places in software. One common usage is to generate seemingly random values called pseudorandom values. These values are indistinguishable from random values that you can otherwise get from /dev/urandom, which takes time to generate. All you need to do is “seed” a hash function, and you can get an infinite number of seemingly random values.

Another common usage is to verify integrity. You hash a file and send the file and hash. Once someone receives the file, he/she can hash it, and if the hashes match, you know that it hasn’t been modified. This is commonly used when downloading files or with Javascript libraries.

Another use case is for password verification. It’s risky to store passwords in plaintext, so they are hashed in databases. This way, an application can check the password without knowing the password itself.

Finally, cryptocurrencies use hash functions as proof of work. They ask miners to find x given H(x), which requires them to iterate through values of x until they find one that matches H(x).

Encryption

Another cryptography concept commonly used in software is encryption. This is different from hashing. The main purposes of hashing are integrity and verification, i.e. data hasn’t been modified or it is what you say it is. The purpose of encryption is for confidentiality, which is meant to hide data. Therefore, it’s possible to use both encryption and hashing together.

To explain encryption, I will discuss symmetric key encryption for simplicity, which means there is one key. It’s the responsibility of the parties involved to ensure the key is securely distributed. I will discuss public key encryption in a different newsletter, but many of the core algorithms and concepts are the same.

Unlike hashing, encryption has three main functions:

key generation
encryption
decryption

In the key generation function, a random key is generated. Typically, we use a pseudorandom function to generate this. (This is an application of the hashing function above.) We won’t discuss how to distribute this key securely, so we will assume the relevant parties receive this key. If you’re interested, you can read about key exchanges. A common one is the Diffie-Hellman key exchange, and it’s the foundation for key exchanges, such as TLS.

The encryption function takes as input, a key and a message, and it outputs a ciphertext. The decryption function takes as input, a key and a ciphertext, and outputs the original message.

A few observations:

For an encryption algorithm to be considered correct, given an encryption function E and decryption function D with a key k, D(k, E(k,m)) = m, where m is an arbitrary message.
Everything is public except the key. To maintain security, the key must be kept private. If the key is leaked, there are no longer guarantees of security.
The ciphertext should be indistinguishable from a random string. This way, there’s no way to learn any information about the original message from the ciphertext alone.
Finally, unlike hash functions, the output of the encryption function, i.e. the ciphertext, has to be at least as long as the original message. Otherwise, there would be no way to recover the original message with the decryption function.

The most common encryption functions use the AES cipher block as the foundation. There are only two approved cipher blocks for encryption: AES and triple-DES.

Encryption has one broad use case: to hide information with the ability to recover it given the proper secret. It’s important to note that a key difference between encryption and hashing is that with encryption, it’s possible to recover the original message/input.

I hope this was a helpful overview of two key cryptographic concepts that are widely used. If there are other topics people believe I should cover, please feel free to let me know!

Frankly Speaking

Security for developers: Threat Modeling

Discussion about this post