Intro to Symmetric Encryption

What is Symmetric Encryption?

Encryption is a broad topic covering transformations of text, based on a secret, in a way that makes it difficult to reveal the original meaning without knowing something secret. The act of using the secret to reveal the original meaning of text is decrypting and the act of using a secret to hide the original meaning is encrypting.

Encryption is broadly broken into two categories: symmetric, where the secret used to encode is the same secret used to decode; and asymmetric, where the secret used to encode is different from, but related in some way to, the secret used to decode. These secrets used to encrypt and decrypt are called keys because they 'lock' and 'unlock' the text. Encrypted text ('locked') is called ciphertext, and un-encrypted text ('unlocked') is called plaintext. A cipher is the procedure (algorithm) used to perform the encryption and decryption. This article is a general introduction to symmetric encryption.

A Simple Symmetric Cipher: ROT-13

There are many different kinds of encryption, but we'll illustrate using a very simple method called a rotation, a type of substitution cipher. Substitution ciphers work by defining a mapping between characters. For example, if we take the alphabet, A-Z, and assign A to N, B to O, C to P, D to Q, E to R, continuing on until we map M to Z, effectively rotating the letters 13 spaces to the right in their alphabetical order, we will have a complete mapping of substitutions for English letters, and a justification for calling the process a rotation. To encrypt we simply replace each letter with the letter 13 spaces to the right in the alphabet (looping from Z to A when we reach the end), and to decrypt we do the same. This is also a symmetric cipher. Since there are 26 letters, and half of 26 is 13, our first rotation by 13 is undone by the second rotation by 13. In other words the same operation and values are used to encrypt and decrypt, the definition of a symmetric cipher.

The key for such a cipher might look like this:

A B C D E F G H I J K L M
↑↓ ↑↓ ↑↓ ↑↓ ↑↓ ↑↓ ↑↓ ↑↓ ↑↓ ↑↓ ↑↓ ↑↓ ↑↓
N O P Q R S T U V W X Y Z

Where to find the substitution you locate the letter and replace it with the letter above or below. If encoding the word 'HELLO' we would end up with the following output:

URYYB

And as expected, reversing the process reveals out original message:

HELLO

This cipher is referred to as ROT-13, short for rotation by 13.

Basic evaluation of strength

Substitution ciphers are simple to understand, but most are not very secure. One of the big reasons they become insecure is that they re-use letter to letter mappings (better thought of as symbols or characters, since we're not always dealing with English letters) to determine which substitution to make. This is an issue because written language follows easily predictable patterns, in this case how frequently letters appear. You can see how this might work by looking at our ROT-13 encrypted 'HELLO', which encrypted to 'URYYB'. The YY may draw your eye. If I collected enough encrypted text, and found that the frequency of Y's aligned with the expected frequency of L's (which it would begin to fairly quickly), it is likely an L. We can augment that by the knowledge that there are only a limited number of letters that appear twice in succession in common English words, and only a certain number of 5 letter words with such double letters. In general it becomes trivial to design a program to analyze these frequencies and decrypt the text without using the key or algorithm intended. With such a simple cipher we would quickly be able to recreate the key and decrypt in real time. We have 'cracked' the encryption (equivalent to breaking or picking a lock).

There are other methods of 'cracking' encryption without knowing the key, but all rely on the encrypted text holding some sort of information about the underlying data being protected. Some methods are simple, some are complex, and others are simply ingenious or insidious, but they all rely on information leaking into the encrypted text.

The cause of the poor strength in our substitution cipher is that our mappings are predictable. An A will always map to an N, leaking information that can be exploited. If we couldn't predict what a specific letter would map to (or not map to), our solution would not leak information. In other words, the probability that a letter would map to another at any given time, including mapping to itself, would need to be equal. If it were, each iteration of substitution would be as likely to return any of the possible replacements. An observer would not be able to predict the next outcome. The idea of an equal probability of possible outcomes is a definition of randomness. In theory this is fairly straightforward, but implementing that theory in a computer turns out to be more involved, primarily because computers are incapable of generating randomness.

Randomness

There are two types of randomness. When we think of randomness most of us are thinking of what Computer Scientists call Truly Random. I defined this above as being an equal probability of all possible outcomes occurring. There is debate about whether true random actually exists. For our purposes we will assume it does, though most of the random we actual encounter in life is not truly random, it is instead what we refer to as pseudo random. Pseudo random, though not truly random, is expected to be difficult enough to predict to be effectively random for the purpose intended.

Computers, which are entirely incapable of producing true randomness, are quite capable of creating pseudo-randomness. For instance, a video game may need a random value to determine the outcome of an action, say if a player lives or dies, with 50/50 probability. Internal to the computer is a clock that ticks every billionth of a second. An effective source of random, in this situation, may be to divide the number of ticks on the clock by 2 and look at the remainder (either a 0 or 1), using that to determine the outcome. This is not truly random, if we knew the number of ticks, and could control our actions down to the billionth of a second we could predict the outcome. It would be trivial to write a program to do that for us. In practice it may or may not matter. For a single player game, if the owner were to go through the trouble of tracking their ticks and timing their actions, it would really have no negative impact on anything. The same strategy used in a competitive multi-player game with a prize fund could be exploited to the detriment of others, making it a completely unacceptable random source.

There are other schemes used to get psudo-randomness from computers, such as what is displayed on your monitor, or the characteristics of how your computer access your hard disk, as well as others. Each has varying levels of visibility to exploit, but they all are predictable on one level or another.

I want to stress this. To those not technically inclined the idea of predicting things down to the billionth of a second, or predict what someone might see on their screen seems impossible, but it can actually be quite easy. For instance, if you are connected to a network your traffic can be monitored and that alone can be used to determine the timing of your PC clock or predict what is being displayed on your screen (be this a warning to always use HTTPS, but some of this is out of your control). This can be accurate enough to individually identify computers, and sync clocks with them. Another great example are audio attacks, where simply by listening to the sound a computer makes one can begin to predict what it is doing.

Though computers are incapable of generating true randomness, that doesn't prevent them from using it, and there are hardware devices that will supply randomness to the computer that are truly random, at least in the sense that it is not possible using current knowledge to predict their output. An example of one method used is to monitor the decay of a radiation source (quantum randomness), though there are a number of such schemes that can be considered truly random. These sources should be used by any encryption that requires high security. When implemented properly they truly are unpredictable.

XOR - A little elementary math to tie it all together

At this point we have a great theory. We know that if we randomly choose our mappings (the key) we can avoid leaking information into our ciphertext, and we know ways to get that randomness. But how do we tie that together? How do we leverage that randomness to produce mappings that can be undone with the same key? Remember, we're talking symmetric encryption here, the same key does both encryption and decryption.

We start with something everyone takes for granted, that computers store everything as binary, 1's and 0's. Letters are represented by numbers that can be represented by those 1's and 0's. For instance the letter A corresponds to the number 65, which in binary is 01000001, B corresponds to 66 or 01000010 in binary, C to 67 or 01000011, and so on. This is also how the computer stores randomness, as a string of 1's and 0's where the probability of a 1 or a 0 in any position is equal.

The last piece of our puzzle is a math operation called Exclusive Or, XOR for short. XOR is sometimes referred to as "addition without carry", which is an apt description, since it is the same addition algorithm you learned in elementary school, with the difference that you do not carry to the next place. In other words, 0 + 0 = 0, 1 + 0 = 1, 0 + 1 = 1, and 1 + 1 = 0. The last one, 1 + 1 = 0, is the interesting case. In binary 1 + 1 = 10, zero in the ones place and carry the 1 to the two's place, but since this is addition without carry we simply drop the 1 and are left with 0. To illustrate we will XOR two binary numbers, 01001001 and 00101101

01001001
00101101
01100100

When evaluated left to right we perform the following: 0+0=0. 1+0=1. 0+1=1. 0+0=0. 1+1=0. 0+1=1. 0+0=0. And finally, 1+1=0.

The XOR operation has two very special properties that we are seeking. The first is that if one of the operands is truly random, the result is also truly random. This is because one operand is actually defining a mapping between the other operand and the result, just like we wanted to do with our letters. It may be more difficult to see that at first because we are mapping just two values 0 and 1, but when we combine these mappings to form a larger string, like the 8 bits used to form our letters, we actual end up mapping an entire 8 bit sequence to another 8 bit sequence that either represents a letter or some other character (like a dollar sign, $). It's exactly what we need to represent our random mappings.

The second property is that if you XOR one operand with the result of a previous XOR, you will get the other operand back. In other words, when we XOR our original text with a key, and XOR that result with the same key, we will get our original test back, the property that gives us our asymmetric cipher.

To illustrate, we already saw that 01001001 XOR 00101101 is 01100100. Now if we take that result, 01100100, and XOR it with one of the original operands, 01001001, we will get back the other operand, 00101101.

01100100
01001001
00101101

Evaluated from left to right this is: 0+0=0. 1+1=0. 1+0=1. 0+0=0. 0+1=1. 1+0=1. 0+0=0. 0+1=1.

XOR is commutative, A XOR B is the same as B XOR A, just like addition.

The one time pad

With a source of randomness to get keys from, and the features that XOR provides, we are ready to perform a far more advanced and secure encryption, using completely random mappings. We'll encrypt the text 'HELLO', which in binary is the string 0100100001000101010011000100110001001111. We'll use the random string 0101100001110010101000100010100110111111, and XOR them together

0100100001000101010011000100110001001111
0101100001110010101000100010100110111111
0001000000110111111011100110010111110000

The result, 0001000000110111111011100110010111110000, as normal letters would be '7îeð'. This may look a little odd, some of it isn't using typical English letters, but these are just characters used on the computer that correspond with the numbers represented in the binary string. They matter not because they should be meaningless until decrypted.

Let's reverse the process by taking our key which is the random value we used when encrypting, 0101100001110010101000100010100110111111, and XOR it with the encrypted ciphertext, 0001000000110111111011100110010111110000.

0101100001110010101000100010100110111111
0001000000110111111011100110010111110000
0100100001000101010011000100110001001111

Our result, 0100100001000101010011000100110001001111, matches our binary representation of 'HELLO', it decrypted to the expected value.

This kind of a cipher is called a One Time Pad. The One Time part of the name comes from the fact that to remain secure you can only use a key once to encrypt. If the same key is used to encrypt another message the mappings will be repeated. That is precisely why we developed this system, to avoid repeated mappings that would be predictable. The only way to ensure secrecy of the data protected is to never use the key again so all mappings are always entirely random compared to all other mappings used.

The pad part of the name comes from the way they were traditionally used. Before computers, the random keys would be written on pads of paper; 2 identical pads, one used for encryption and the other for decryption. Each page would have a serial number. A message would be encrypted with the key on a page and the serial number noted on the communication. The encryption key page would be torn from the pad and destroyed. The receiving end would find their corresponding page with the same serial number (and copy of the key), decrypt the message, tear the page from the pad, and destroy it. Assuming that the pad's containing the keys do not become compromised, if this procedure is followed precisely for every message, the messages will remain secure.

Perfect Secrecy

When I say the messages will remain secure I don't mean probably, most likely, or until computers become faster. I mean it is a provable guarantee that it will remain secure, forever. This is because the One Time Pad exhibits a feature called Perfect Secrecy. Literally this means that without the key, it is impossible to retrieve the original message and know that is what you retrieved. I will say that again, it is IMPOSSIBLE to know the original message without knowing the key.

Without boring you with a proof I will give an example that helps to illustrate why this is. We know we aren't leaking any information (other than the length of the string, which doesn't matter for sufficiently large strings, and smaller strings can be padded with random bits to make them sufficiently large) because we can't predict anything about the random key we used, and we have never reused a mapping. So our only plausible strategy is to try every possible key. We start with all 0's as the key, XOR, see if the result makes sense. Next we XOR with all 0's + 1, see if the result makes sense. We continue to add one to our key and XOR until we have tried every possible key, checking each time to see if the result makes sense, like a military order, or is just jibberish. This is called a brute force attack and against some types of 'locks', can be successful. With a One Time Pad a brute force attack is not successful, and you can see why by XORing all the permutations of a smaller 2 or 3 bit value, you will just end up with every possible result, all the numbers from 0 to the length of what you are trying to decrypt. The message you were looking for will certainly be one of those results, but you will never be able to tell which it is because there will be lots of other results that would make just as much sense.

To illustrate, using the message "Attack at dawn from the N", when encrypted one may try every possible key to decrypt. Many of the results will be jibberish and hold no meaning. One result will be the original message "Attack at dawn from the N", but another result would be "Attack at dawn from the S", and yet another "Attack at dawn from the E", and another still "Attack at noon from the W". Worse yet, there many many other possible decryptions you will stumble upon, like "WITHDRAW NOW OR SURRENDER" or "this string is 25 letters". Eventually you will simply end up with every possible permutation of strings of that length, and as a result ever possible grammatically correct combination of words. You will never know which was the original message!