Notes / Let's Implement AES

(This was originally on my old website, I have resurrected it here.)

We start by going over the maths underlying AES, a symmetric cipher based on Rijndael, and standardised by NIST in 2001. Then produce a working implementation in C. Also we implement the cipher block chaining mode of operation to extend the cipher to work on data that has length divisible by the block size. We do not consider padding schemes that are required to operate on arbitrary length data.

Obligatory disclaimer

The code we produce prioritises readability and not efficiency or constant time execution, it is for educational purposes only. Also if knowledge of cryptography is illegal in your jurisdiction, avert thy gaze, ;). That being said, please enjoy.

Introduction

AES is a block cipher that operates on a 128 bit piece of data, and can have key sizes of 128, 192 or 256 bits. We will implement 128 bit AES only, but our implementation can be trivially extended to support other key sizes. The standard will be our essential guide, and should be read along side this.

The data is put into the state matrix by filling column first. In the implementation it would be nice if we could just cast the data array and treat it as the state matrix, because of C's semantics to get the correct filling we need to use column-major indexing for the state.

Fields

AES utilises finite field arithmetic, in particular Rijndael's field.

But, what is a field?

Definition

A field is a set \( \mathcal{F} \), and two functions \( + \), \( \cdot \), which we name addition and multiplication, and write infix.

Furthermore the following must hold:

What does this mean?

This means a field is some set with a structure that behaves similarly to how arithmetic with the reals works. The above rules encode these familiar behaviours in a general way. This is analogous to an interface / type class with extra laws to obey.

Some familiar examples are the reals, fractions, and complex numbers, with the usual addition and multiplication.

Isomorphisms

One further concept is that of an isomorphism.

Given two fields \( \mathcal{F} \), \( \mathcal{G} \), an isomorphism between them is an invertible function \( f : \mathcal{F} \to \mathcal{G} \) such that:

This just says that an isomorphism is a relabelling that preserves the structure.

Galois fields

Galois fields, are fields that have finitely many elements.

There is exactly one field (up to isomorphism) of size \( p^n \), for each \( p \) prime, \( n \geq 1 \), we write this field as \( \text{GF}(p^n) \). There are no finite fields that have a non prime power number of elements.

A nice family of examples are the \( \text{GF}(p) \), these can be easily realised as integers modulo \( p \). In particular another familiar realisation of \( \text{GF}(2) \) is as \( \{\text{true},\;\text{false}\} \) with \( + = XOR \) and \( \cdot = AND \).

The best method of realising \( \text{GF}(p^n) \) is using polynomials.

Polynomial rings

A commutative ring is a structure like a field except multiplication doesn't need to have inverses.

A polynomial ring over a field \( \mathcal{F} \), written \( \mathcal{F}[X] \), is all the expressions of the form \( a_0 + a_1X + \cdots + a_nX^n \) with the \( a_i \in \mathcal{F} \). Equivalently arbitrary finite length tuples with components in \( \mathcal{F} \). The symbol \( X \) is called an indeterminate and doesn't mean anything, it's a convenient way of tracking indices and makes the definition of multiplication nice.

Addition is just termwise.

Multiplication is expand the product and rewrite \( X^nX^m \) as \( X^{n + m} \).

An irreducible polynomial is a non-constant polynomial that doesn't factor into the product of non-constant polynomials.

We need one more thing before we can construct \( \text{GF}(p^n) \), quotient rings.

Quotient rings

Note to form quotients we need a special subset of the ring called an ideal, this subset satisfies some conditions that ensures the quotient construction returns a ring. These conditions are not important to us here, but for completion/interest a subset of a ring is an ideal if it is a subgroup of the ring's additive group, and absorbs multiplication by elements of the ring.

Given ring \( \mathcal{R} \) and ideal \( \mathcal{I} \) we write the quotient as \( \mathcal{R}/\mathcal{I} \).

The elements of the quotient ring are the cosets \( a + \mathcal{I} = \{a + i \;|\; i \in \mathcal{I}\} \), notice that \( a + \mathcal{I} = b + \mathcal{I} \) if \( a - b \in \mathcal{I} \). That is we consider elements that differ by an element of the ideal the same in the quotient ring.

Addition is \( (a + \mathcal{I}) + (b + \mathcal{I}) = (a + b) + \mathcal{I} \).

Multiplication is \( (a + \mathcal{I}) \cdot (b + \mathcal{I}) = (a \cdot b) + \mathcal{I} \).

Constructing \( \text{GF}(p^n) \)

Our starting ring is \( \text{GF}(p) \), we need an irreducible polynomial, \( f \), over \( \text{GF}(p) \) of degree \( n \).

Our ideal is \( (f) = \{r\cdot f\;|\;r\in\text{GF}(p)\} \), that is all of the multiples of our irreducible polynomial.

The quotient can be interpreted as all of the polynomials in \( \text{GF}(p)[X] \) of degree less than \( n \), the addition is regular polynomial addition, and multiplication is multiplication modulo \( f \).

Rijndael's finite field

The arithmetic in AES happens in the field \( \text{GF}(2^8) = \text{GF}(256) \), we realise this as \( \text{GF}(2)[X]/(X^8 + X^4 + X^3 + X + 1) \). For implementing this we represent the polynomials as bytes.

$$ a_7X^7 + a_6X^6 + a_5X^5 + a_4X^4 + a_3X^3 + a_2X^2 + a_1X + a_0 \mapsto (a_7a_6a_5a_4a_3a_2a_1a_0)_2 $$

Since addition mod 2 is equivalent to xoring, to add polynomials in the byte representation we XOR the bytes.

We can use an inefficient implementation of multiplication, viz. multiply then use long division to get the remainder, to bootstrap to an efficient implementation, viz. log, then add, then exp.

We need an element, called a generator, \( g \), such that any element can be written as \( g^n = \overset{n\;\text{times}}{\overbrace{g\cdot g\;\cdots\;g}} \), for some \( n \). A brute force search reveals that \( X + 1 \mapsto 0\text{x}03 \) is such an element.

To get the exp array just keep multiplying 0x03 by itself, and the log array by swapping the indices and values in the exp array.

Using this we get the following

// A name for unused/undefined values.
#define XXXX 0x00

// Exp base 0x03 in GF(2^8).
uint8_t gf256_exp[256] = {
    0x01, 0x03, 0x05, 0x0F, 0x11, 0x33, 0x55, 0xFF, 0x1A, 0x2E, 0x72, 0x96, 0xA1, 0xF8, 0x13, 0x35,
    0x5F, 0xE1, 0x38, 0x48, 0xD8, 0x73, 0x95, 0xA4, 0xF7, 0x02, 0x06, 0x0A, 0x1E, 0x22, 0x66, 0xAA,
    0xE5, 0x34, 0x5C, 0xE4, 0x37, 0x59, 0xEB, 0x26, 0x6A, 0xBE, 0xD9, 0x70, 0x90, 0xAB, 0xE6, 0x31,
    0x53, 0xF5, 0x04, 0x0C, 0x14, 0x3C, 0x44, 0xCC, 0x4F, 0xD1, 0x68, 0xB8, 0xD3, 0x6E, 0xB2, 0xCD,
    0x4C, 0xD4, 0x67, 0xA9, 0xE0, 0x3B, 0x4D, 0xD7, 0x62, 0xA6, 0xF1, 0x08, 0x18, 0x28, 0x78, 0x88,
    0x83, 0x9E, 0xB9, 0xD0, 0x6B, 0xBD, 0xDC, 0x7F, 0x81, 0x98, 0xB3, 0xCE, 0x49, 0xDB, 0x76, 0x9A,
    0xB5, 0xC4, 0x57, 0xF9, 0x10, 0x30, 0x50, 0xF0, 0x0B, 0x1D, 0x27, 0x69, 0xBB, 0xD6, 0x61, 0xA3,
    0xFE, 0x19, 0x2B, 0x7D, 0x87, 0x92, 0xAD, 0xEC, 0x2F, 0x71, 0x93, 0xAE, 0xE9, 0x20, 0x60, 0xA0,
    0xFB, 0x16, 0x3A, 0x4E, 0xD2, 0x6D, 0xB7, 0xC2, 0x5D, 0xE7, 0x32, 0x56, 0xFA, 0x15, 0x3F, 0x41,
    0xC3, 0x5E, 0xE2, 0x3D, 0x47, 0xC9, 0x40, 0xC0, 0x5B, 0xED, 0x2C, 0x74, 0x9C, 0xBF, 0xDA, 0x75,
    0x9F, 0xBA, 0xD5, 0x64, 0xAC, 0xEF, 0x2A, 0x7E, 0x82, 0x9D, 0xBC, 0xDF, 0x7A, 0x8E, 0x89, 0x80,
    0x9B, 0xB6, 0xC1, 0x58, 0xE8, 0x23, 0x65, 0xAF, 0xEA, 0x25, 0x6F, 0xB1, 0xC8, 0x43, 0xC5, 0x54,
    0xFC, 0x1F, 0x21, 0x63, 0xA5, 0xF4, 0x07, 0x09, 0x1B, 0x2D, 0x77, 0x99, 0xB0, 0xCB, 0x46, 0xCA,
    0x45, 0xCF, 0x4A, 0xDE, 0x79, 0x8B, 0x86, 0x91, 0xA8, 0xE3, 0x3E, 0x42, 0xC6, 0x51, 0xF3, 0x0E,
    0x12, 0x36, 0x5A, 0xEE, 0x29, 0x7B, 0x8D, 0x8C, 0x8F, 0x8A, 0x85, 0x94, 0xA7, 0xF2, 0x0D, 0x17,
    0x39, 0x4B, 0xDD, 0x7C, 0x84, 0x97, 0xA2, 0xFD, 0x1C, 0x24, 0x6C, 0xB4, 0xC7, 0x52, 0xF6, 0x01
};

// Log base 0x03 in GF(2^8).
uint8_t gf256_log[256] = {
    XXXX, 0x00, 0x19, 0x01, 0x32, 0x02, 0x1A, 0xC6, 0x4B, 0xC7, 0x1B, 0x68, 0x33, 0xEE, 0xDF, 0x03,
    0x64, 0x04, 0xE0, 0x0E, 0x34, 0x8D, 0x81, 0xEF, 0x4C, 0x71, 0x08, 0xC8, 0xF8, 0x69, 0x1C, 0xC1,
    0x7D, 0xC2, 0x1D, 0xB5, 0xF9, 0xB9, 0x27, 0x6A, 0x4D, 0xE4, 0xA6, 0x72, 0x9A, 0xC9, 0x09, 0x78,
    0x65, 0x2F, 0x8A, 0x05, 0x21, 0x0F, 0xE1, 0x24, 0x12, 0xF0, 0x82, 0x45, 0x35, 0x93, 0xDA, 0x8E,
    0x96, 0x8F, 0xDB, 0xBD, 0x36, 0xD0, 0xCE, 0x94, 0x13, 0x5C, 0xD2, 0xF1, 0x40, 0x46, 0x83, 0x38,
    0x66, 0xDD, 0xFD, 0x30, 0xBF, 0x06, 0x8B, 0x62, 0xB3, 0x25, 0xE2, 0x98, 0x22, 0x88, 0x91, 0x10,
    0x7E, 0x6E, 0x48, 0xC3, 0xA3, 0xB6, 0x1E, 0x42, 0x3A, 0x6B, 0x28, 0x54, 0xFA, 0x85, 0x3D, 0xBA,
    0x2B, 0x79, 0x0A, 0x15, 0x9B, 0x9F, 0x5E, 0xCA, 0x4E, 0xD4, 0xAC, 0xE5, 0xF3, 0x73, 0xA7, 0x57,
    0xAF, 0x58, 0xA8, 0x50, 0xF4, 0xEA, 0xD6, 0x74, 0x4F, 0xAE, 0xE9, 0xD5, 0xE7, 0xE6, 0xAD, 0xE8,
    0x2C, 0xD7, 0x75, 0x7A, 0xEB, 0x16, 0x0B, 0xF5, 0x59, 0xCB, 0x5F, 0xB0, 0x9C, 0xA9, 0x51, 0xA0,
    0x7F, 0x0C, 0xF6, 0x6F, 0x17, 0xC4, 0x49, 0xEC, 0xD8, 0x43, 0x1F, 0x2D, 0xA4, 0x76, 0x7B, 0xB7,
    0xCC, 0xBB, 0x3E, 0x5A, 0xFB, 0x60, 0xB1, 0x86, 0x3B, 0x52, 0xA1, 0x6C, 0xAA, 0x55, 0x29, 0x9D,
    0x97, 0xB2, 0x87, 0x90, 0x61, 0xBE, 0xDC, 0xFC, 0xBC, 0x95, 0xCF, 0xCD, 0x37, 0x3F, 0x5B, 0xD1,
    0x53, 0x39, 0x84, 0x3C, 0x41, 0xA2, 0x6D, 0x47, 0x14, 0x2A, 0x9E, 0x5D, 0x56, 0xF2, 0xD3, 0xAB,
    0x44, 0x11, 0x92, 0xD9, 0x23, 0x20, 0x2E, 0x89, 0xB4, 0x7C, 0xB8, 0x26, 0x77, 0x99, 0xE3, 0xA5,
    0x67, 0x4A, 0xED, 0xDE, 0xC5, 0x31, 0xFE, 0x18, 0x0D, 0x63, 0x8C, 0x80, 0xC0, 0xF7, 0x70, 0x07
};

And we have an efficient multiplication function.

// Multiply two elements of GF(2^8).
uint8_t gf256_mul(uint8_t a, uint8_t b) {
    if (a == 0 || b == 0) {
        return 0;
    }
    int temp = gf256_log[a] + gf256_log[b];
    if (temp > 255) {
        temp -= 255;
    }
    return gf256_exp[temp];
}

AES overview

AES has a substitution box that has been designed to be resistant to differential cryptanalysis. It is the precomputed result of a particular transform.

// Rijndael's S-box, generated by applying a particular affine transformation to
// the inverse of a given number in GF(2^8) (with zero mapped to zero for
// inverses). Note that GF(2^8) ≈ F2[X]/(X^8 + X^4 + X^3 + X + 1).
uint8_t sbox[256] = {
    0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
    0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
    0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
    0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
    0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
    0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
    0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
    0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
    0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
    0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
    0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
    0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
    0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
    0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
    0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
    0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16
 };

// Inverse S-box, inv_sbox ∘ sbox = sbox ∘ inv_sbox = id.
uint8_t inv_sbox[256] = {
    0x52, 0x09, 0x6A, 0xD5, 0x30, 0x36, 0xA5, 0x38, 0xBF, 0x40, 0xA3, 0x9E, 0x81, 0xF3, 0xD7, 0xFB,
    0x7C, 0xE3, 0x39, 0x82, 0x9B, 0x2F, 0xFF, 0x87, 0x34, 0x8E, 0x43, 0x44, 0xC4, 0xDE, 0xE9, 0xCB,
    0x54, 0x7B, 0x94, 0x32, 0xA6, 0xC2, 0x23, 0x3D, 0xEE, 0x4C, 0x95, 0x0B, 0x42, 0xFA, 0xC3, 0x4E,
    0x08, 0x2E, 0xA1, 0x66, 0x28, 0xD9, 0x24, 0xB2, 0x76, 0x5B, 0xA2, 0x49, 0x6D, 0x8B, 0xD1, 0x25,
    0x72, 0xF8, 0xF6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xD4, 0xA4, 0x5C, 0xCC, 0x5D, 0x65, 0xB6, 0x92,
    0x6C, 0x70, 0x48, 0x50, 0xFD, 0xED, 0xB9, 0xDA, 0x5E, 0x15, 0x46, 0x57, 0xA7, 0x8D, 0x9D, 0x84,
    0x90, 0xD8, 0xAB, 0x00, 0x8C, 0xBC, 0xD3, 0x0A, 0xF7, 0xE4, 0x58, 0x05, 0xB8, 0xB3, 0x45, 0x06,
    0xD0, 0x2C, 0x1E, 0x8F, 0xCA, 0x3F, 0x0F, 0x02, 0xC1, 0xAF, 0xBD, 0x03, 0x01, 0x13, 0x8A, 0x6B,
    0x3A, 0x91, 0x11, 0x41, 0x4F, 0x67, 0xDC, 0xEA, 0x97, 0xF2, 0xCF, 0xCE, 0xF0, 0xB4, 0xE6, 0x73,
    0x96, 0xAC, 0x74, 0x22, 0xE7, 0xAD, 0x35, 0x85, 0xE2, 0xF9, 0x37, 0xE8, 0x1C, 0x75, 0xDF, 0x6E,
    0x47, 0xF1, 0x1A, 0x71, 0x1D, 0x29, 0xC5, 0x89, 0x6F, 0xB7, 0x62, 0x0E, 0xAA, 0x18, 0xBE, 0x1B,
    0xFC, 0x56, 0x3E, 0x4B, 0xC6, 0xD2, 0x79, 0x20, 0x9A, 0xDB, 0xC0, 0xFE, 0x78, 0xCD, 0x5A, 0xF4,
    0x1F, 0xDD, 0xA8, 0x33, 0x88, 0x07, 0xC7, 0x31, 0xB1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xEC, 0x5F,
    0x60, 0x51, 0x7F, 0xA9, 0x19, 0xB5, 0x4A, 0x0D, 0x2D, 0xE5, 0x7A, 0x9F, 0x93, 0xC9, 0x9C, 0xEF,
    0xA0, 0xE0, 0x3B, 0x4D, 0xAE, 0x2A, 0xF5, 0xB0, 0xC8, 0xEB, 0xBB, 0x3C, 0x83, 0x53, 0x99, 0x61,
    0x17, 0x2B, 0x04, 0x7E, 0xBA, 0x77, 0xD6, 0x26, 0xE1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0C, 0x7D
 };

Key expansion

The Rijndael key schedule is an algorithm that takes the 128 byte secret key, and expands it into 128 bit keys for each round.

Round functions

AES-128 has ten rounds, these functions transform the state, and are used to build the rounds. For decryption we apply the inverses in the reverse order to that of encryption.

SubBytes()

This is just byte substitution, replacing entries in the state with the value at that index in the sbox.

The inverse is the same except using the inverse sbox.

ShiftRows()

A cyclic shift of the rows. Row \( i \) is shifted by \( i \) to the left, i.e. row 0 unchanged, row 1 one position, etc.

For the inverse we shift right instead.

MixColumns()

We consider each column of the state as an element of \( \text{GF}(2^8)[X] \), multiply by \( 0\text{x}03 X^3 + 0\text{x}01 X^2 + 0\text{x}01 X + 0\text{x}02 \), then reduce mod \( X^4 + 1 \).

For the inverse the \( 0\text{x}0B X^3 + 0\text{x}0D X^2 + 0\text{x}09 X + 0\text{x}0E \) is used instead.

This operation can equivalently be considered as matrix multiplication.

AddRoundKey()

The state is XORed with the round key, in a column first ordering.

This is self inverse.

Implementation

First our header.

{ aes128.h }

#if !defined(__AES128_H__)
#define __AES128_H__

#include <stddef.h>
#include <stdint.h>

void aes128_init(uint8_t* key);
void aes128_encrypt(uint8_t* block);
void aes128_decrypt(uint8_t* block);

#endif // __AES128_H__

Now the implementation, read alongside the standard.

{ aes128.c }

#include "aes128.h"

// A name for unused/undefined values.
#define XXXX 0x00

// Rijndael's S-box, generated by applying a particular affine transformation to
// the inverse of a given number in GF(2^8) (with zero mapped to zero for
// inverses). Note that GF(2^8) ≈ F2[X]/(X^8 + X^4 + X^3 + X + 1).
static const uint8_t sbox[256] = {
    0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
    0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
    0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
    0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
    0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
    0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
    0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
    0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
    0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
    0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
    0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
    0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
    0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
    0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
    0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
    0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16
 };

// Inverse S-box, inv_sbox ∘ sbox = sbox ∘ inv_sbox = id.
static const uint8_t inv_sbox[256] = {
    0x52, 0x09, 0x6A, 0xD5, 0x30, 0x36, 0xA5, 0x38, 0xBF, 0x40, 0xA3, 0x9E, 0x81, 0xF3, 0xD7, 0xFB,
    0x7C, 0xE3, 0x39, 0x82, 0x9B, 0x2F, 0xFF, 0x87, 0x34, 0x8E, 0x43, 0x44, 0xC4, 0xDE, 0xE9, 0xCB,
    0x54, 0x7B, 0x94, 0x32, 0xA6, 0xC2, 0x23, 0x3D, 0xEE, 0x4C, 0x95, 0x0B, 0x42, 0xFA, 0xC3, 0x4E,
    0x08, 0x2E, 0xA1, 0x66, 0x28, 0xD9, 0x24, 0xB2, 0x76, 0x5B, 0xA2, 0x49, 0x6D, 0x8B, 0xD1, 0x25,
    0x72, 0xF8, 0xF6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xD4, 0xA4, 0x5C, 0xCC, 0x5D, 0x65, 0xB6, 0x92,
    0x6C, 0x70, 0x48, 0x50, 0xFD, 0xED, 0xB9, 0xDA, 0x5E, 0x15, 0x46, 0x57, 0xA7, 0x8D, 0x9D, 0x84,
    0x90, 0xD8, 0xAB, 0x00, 0x8C, 0xBC, 0xD3, 0x0A, 0xF7, 0xE4, 0x58, 0x05, 0xB8, 0xB3, 0x45, 0x06,
    0xD0, 0x2C, 0x1E, 0x8F, 0xCA, 0x3F, 0x0F, 0x02, 0xC1, 0xAF, 0xBD, 0x03, 0x01, 0x13, 0x8A, 0x6B,
    0x3A, 0x91, 0x11, 0x41, 0x4F, 0x67, 0xDC, 0xEA, 0x97, 0xF2, 0xCF, 0xCE, 0xF0, 0xB4, 0xE6, 0x73,
    0x96, 0xAC, 0x74, 0x22, 0xE7, 0xAD, 0x35, 0x85, 0xE2, 0xF9, 0x37, 0xE8, 0x1C, 0x75, 0xDF, 0x6E,
    0x47, 0xF1, 0x1A, 0x71, 0x1D, 0x29, 0xC5, 0x89, 0x6F, 0xB7, 0x62, 0x0E, 0xAA, 0x18, 0xBE, 0x1B,
    0xFC, 0x56, 0x3E, 0x4B, 0xC6, 0xD2, 0x79, 0x20, 0x9A, 0xDB, 0xC0, 0xFE, 0x78, 0xCD, 0x5A, 0xF4,
    0x1F, 0xDD, 0xA8, 0x33, 0x88, 0x07, 0xC7, 0x31, 0xB1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xEC, 0x5F,
    0x60, 0x51, 0x7F, 0xA9, 0x19, 0xB5, 0x4A, 0x0D, 0x2D, 0xE5, 0x7A, 0x9F, 0x93, 0xC9, 0x9C, 0xEF,
    0xA0, 0xE0, 0x3B, 0x4D, 0xAE, 0x2A, 0xF5, 0xB0, 0xC8, 0xEB, 0xBB, 0x3C, 0x83, 0x53, 0x99, 0x61,
    0x17, 0x2B, 0x04, 0x7E, 0xBA, 0x77, 0xD6, 0x26, 0xE1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0C, 0x7D
 };

// Round constants, rcon: GF(2^8) -> GF(2^8), i |-> 2^(i - 1).
// rcon[0] is unused.
static const uint8_t rcon[11] = {XXXX, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1B, 0x36};

// Exp base 0x03 in GF(2^8).
static const uint8_t gf256_exp[256] = {
    0x01, 0x03, 0x05, 0x0F, 0x11, 0x33, 0x55, 0xFF, 0x1A, 0x2E, 0x72, 0x96, 0xA1, 0xF8, 0x13, 0x35,
    0x5F, 0xE1, 0x38, 0x48, 0xD8, 0x73, 0x95, 0xA4, 0xF7, 0x02, 0x06, 0x0A, 0x1E, 0x22, 0x66, 0xAA,
    0xE5, 0x34, 0x5C, 0xE4, 0x37, 0x59, 0xEB, 0x26, 0x6A, 0xBE, 0xD9, 0x70, 0x90, 0xAB, 0xE6, 0x31,
    0x53, 0xF5, 0x04, 0x0C, 0x14, 0x3C, 0x44, 0xCC, 0x4F, 0xD1, 0x68, 0xB8, 0xD3, 0x6E, 0xB2, 0xCD,
    0x4C, 0xD4, 0x67, 0xA9, 0xE0, 0x3B, 0x4D, 0xD7, 0x62, 0xA6, 0xF1, 0x08, 0x18, 0x28, 0x78, 0x88,
    0x83, 0x9E, 0xB9, 0xD0, 0x6B, 0xBD, 0xDC, 0x7F, 0x81, 0x98, 0xB3, 0xCE, 0x49, 0xDB, 0x76, 0x9A,
    0xB5, 0xC4, 0x57, 0xF9, 0x10, 0x30, 0x50, 0xF0, 0x0B, 0x1D, 0x27, 0x69, 0xBB, 0xD6, 0x61, 0xA3,
    0xFE, 0x19, 0x2B, 0x7D, 0x87, 0x92, 0xAD, 0xEC, 0x2F, 0x71, 0x93, 0xAE, 0xE9, 0x20, 0x60, 0xA0,
    0xFB, 0x16, 0x3A, 0x4E, 0xD2, 0x6D, 0xB7, 0xC2, 0x5D, 0xE7, 0x32, 0x56, 0xFA, 0x15, 0x3F, 0x41,
    0xC3, 0x5E, 0xE2, 0x3D, 0x47, 0xC9, 0x40, 0xC0, 0x5B, 0xED, 0x2C, 0x74, 0x9C, 0xBF, 0xDA, 0x75,
    0x9F, 0xBA, 0xD5, 0x64, 0xAC, 0xEF, 0x2A, 0x7E, 0x82, 0x9D, 0xBC, 0xDF, 0x7A, 0x8E, 0x89, 0x80,
    0x9B, 0xB6, 0xC1, 0x58, 0xE8, 0x23, 0x65, 0xAF, 0xEA, 0x25, 0x6F, 0xB1, 0xC8, 0x43, 0xC5, 0x54,
    0xFC, 0x1F, 0x21, 0x63, 0xA5, 0xF4, 0x07, 0x09, 0x1B, 0x2D, 0x77, 0x99, 0xB0, 0xCB, 0x46, 0xCA,
    0x45, 0xCF, 0x4A, 0xDE, 0x79, 0x8B, 0x86, 0x91, 0xA8, 0xE3, 0x3E, 0x42, 0xC6, 0x51, 0xF3, 0x0E,
    0x12, 0x36, 0x5A, 0xEE, 0x29, 0x7B, 0x8D, 0x8C, 0x8F, 0x8A, 0x85, 0x94, 0xA7, 0xF2, 0x0D, 0x17,
    0x39, 0x4B, 0xDD, 0x7C, 0x84, 0x97, 0xA2, 0xFD, 0x1C, 0x24, 0x6C, 0xB4, 0xC7, 0x52, 0xF6, 0x01
};

// Log base 0x03 in GF(2^8).
static const uint8_t gf256_log[256] = {
    XXXX, 0x00, 0x19, 0x01, 0x32, 0x02, 0x1A, 0xC6, 0x4B, 0xC7, 0x1B, 0x68, 0x33, 0xEE, 0xDF, 0x03,
    0x64, 0x04, 0xE0, 0x0E, 0x34, 0x8D, 0x81, 0xEF, 0x4C, 0x71, 0x08, 0xC8, 0xF8, 0x69, 0x1C, 0xC1,
    0x7D, 0xC2, 0x1D, 0xB5, 0xF9, 0xB9, 0x27, 0x6A, 0x4D, 0xE4, 0xA6, 0x72, 0x9A, 0xC9, 0x09, 0x78,
    0x65, 0x2F, 0x8A, 0x05, 0x21, 0x0F, 0xE1, 0x24, 0x12, 0xF0, 0x82, 0x45, 0x35, 0x93, 0xDA, 0x8E,
    0x96, 0x8F, 0xDB, 0xBD, 0x36, 0xD0, 0xCE, 0x94, 0x13, 0x5C, 0xD2, 0xF1, 0x40, 0x46, 0x83, 0x38,
    0x66, 0xDD, 0xFD, 0x30, 0xBF, 0x06, 0x8B, 0x62, 0xB3, 0x25, 0xE2, 0x98, 0x22, 0x88, 0x91, 0x10,
    0x7E, 0x6E, 0x48, 0xC3, 0xA3, 0xB6, 0x1E, 0x42, 0x3A, 0x6B, 0x28, 0x54, 0xFA, 0x85, 0x3D, 0xBA,
    0x2B, 0x79, 0x0A, 0x15, 0x9B, 0x9F, 0x5E, 0xCA, 0x4E, 0xD4, 0xAC, 0xE5, 0xF3, 0x73, 0xA7, 0x57,
    0xAF, 0x58, 0xA8, 0x50, 0xF4, 0xEA, 0xD6, 0x74, 0x4F, 0xAE, 0xE9, 0xD5, 0xE7, 0xE6, 0xAD, 0xE8,
    0x2C, 0xD7, 0x75, 0x7A, 0xEB, 0x16, 0x0B, 0xF5, 0x59, 0xCB, 0x5F, 0xB0, 0x9C, 0xA9, 0x51, 0xA0,
    0x7F, 0x0C, 0xF6, 0x6F, 0x17, 0xC4, 0x49, 0xEC, 0xD8, 0x43, 0x1F, 0x2D, 0xA4, 0x76, 0x7B, 0xB7,
    0xCC, 0xBB, 0x3E, 0x5A, 0xFB, 0x60, 0xB1, 0x86, 0x3B, 0x52, 0xA1, 0x6C, 0xAA, 0x55, 0x29, 0x9D,
    0x97, 0xB2, 0x87, 0x90, 0x61, 0xBE, 0xDC, 0xFC, 0xBC, 0x95, 0xCF, 0xCD, 0x37, 0x3F, 0x5B, 0xD1,
    0x53, 0x39, 0x84, 0x3C, 0x41, 0xA2, 0x6D, 0x47, 0x14, 0x2A, 0x9E, 0x5D, 0x56, 0xF2, 0xD3, 0xAB,
    0x44, 0x11, 0x92, 0xD9, 0x23, 0x20, 0x2E, 0x89, 0xB4, 0x7C, 0xB8, 0x26, 0x77, 0x99, 0xE3, 0xA5,
    0x67, 0x4A, 0xED, 0xDE, 0xC5, 0x31, 0xFE, 0x18, 0x0D, 0x63, 0x8C, 0x80, 0xC0, 0xF7, 0x70, 0x07
};

// State is the block to operate on, column-major indexing (so we can cast the
// block s.t. successive 4 bytes form a column).
static uint8_t (*state)[4][4];

// Round keys.
static uint8_t w[176];


// Cyclic left shift of the bytes in a word.
static void rot_word(uint8_t* word) {
    uint8_t temp = word[0];
    word[0] = word[1];
    word[1] = word[2];
    word[2] = word[3];
    word[3] = temp;
}


// Apply the S-box to each byte in a word.
static void sub_word(uint8_t* word) {
    word[0] = sbox[word[0]];
    word[1] = sbox[word[1]];
    word[2] = sbox[word[2]];
    word[3] = sbox[word[3]];
}


// Rijndael key schedule.
void aes128_init(uint8_t* key) {
    uint8_t temp[4];

    for (size_t i = 0; i < 4; ++i) {
        w[4*i] = key[4*i];
        w[4*i + 1] = key[4*i + 1];
        w[4*i + 2] = key[4*i + 2];
        w[4*i + 3] = key[4*i + 3];
    }

    for (size_t i = 4; i < 44; ++i) {
        for (size_t j = 0; j < 4; ++j) {
            temp[j] = w[4*(i - 1) + j];
        }

        if (i % 4 == 0) {
            rot_word(temp);
            sub_word(temp);
            temp[0] = temp[0] ^ rcon[i/4];
        }

        w[4*i] = w[4*(i - 4)] ^ temp[0];
        w[4*i + 1] = w[4*(i - 4) + 1] ^ temp[1];
        w[4*i + 2] = w[4*(i - 4) + 2] ^ temp[2];
        w[4*i + 3] = w[4*(i - 4) + 3] ^ temp[3];
    }
}


// Apply the S-box to the bytes in the state.
static void sub_bytes(void) {
    for (size_t c = 0; c < 4; ++c) {
        for (size_t r = 0; r < 4; ++r) {
            (*state)[c][r] = sbox[(*state)[c][r]];
        }
    }
}


// Shift row i of the state i columns to the left.
static void shift_rows(void) {
    uint8_t temp;

    // Row 1.
    temp           = (*state)[0][1];
    (*state)[0][1] = (*state)[1][1];
    (*state)[1][1] = (*state)[2][1];
    (*state)[2][1] = (*state)[3][1];
    (*state)[3][1] = temp;

    // Row 2.
    temp           = (*state)[0][2];
    (*state)[0][2] = (*state)[2][2];
    (*state)[2][2] = temp;

    temp           = (*state)[1][2];
    (*state)[1][2] = (*state)[3][2];
    (*state)[3][2] = temp;

    // Row 3.
    temp           = (*state)[0][3];
    (*state)[0][3] = (*state)[3][3];
    (*state)[3][3] = (*state)[2][3];
    (*state)[2][3] = (*state)[1][3];
    (*state)[1][3] = temp;
}


// Multiply two elements of GF(2^8).
static uint8_t gf256_mul(uint8_t a, uint8_t b) {
    if (a == 0 || b == 0) {
        return 0;
    }
    int temp = gf256_log[a] + gf256_log[b];
    if (temp > 255) {
        temp -= 255;
    }
    return gf256_exp[temp];
}


// Considering each column of the state as an element of GF(2^8)[X], multiply by
// 0x03 X^3 + 0x01 X^2 + 0x01 X + 0x02, then reduce mod X^4 + 1.
static void mix_columns(void) {
    uint8_t temp[4];
    for (size_t c = 0; c < 4; ++c) {
        for (size_t r = 0; r < 4; ++r) {
            temp[r] = (*state)[c][r];
        }
        (*state)[c][0] = gf256_mul(temp[0], 0x02) ^ gf256_mul(temp[1], 0x03) ^ gf256_mul(temp[2], 0x01) ^ gf256_mul(temp[3], 0x01);
        (*state)[c][1] = gf256_mul(temp[0], 0x01) ^ gf256_mul(temp[1], 0x02) ^ gf256_mul(temp[2], 0x03) ^ gf256_mul(temp[3], 0x01);
        (*state)[c][2] = gf256_mul(temp[0], 0x01) ^ gf256_mul(temp[1], 0x01) ^ gf256_mul(temp[2], 0x02) ^ gf256_mul(temp[3], 0x03);
        (*state)[c][3] = gf256_mul(temp[0], 0x03) ^ gf256_mul(temp[1], 0x01) ^ gf256_mul(temp[2], 0x01) ^ gf256_mul(temp[3], 0x02);
    }
}


// Add the round key to the state.
static void add_round_key(uint8_t round) {
    for (size_t c = 0; c < 4; ++c) {
        for (size_t r = 0; r < 4; ++r) {
            (*state)[c][r] ^= w[16*round + 4*c + r];
        }
    }
}


// Shift row i of the state i columns to the right.
static void inv_shift_rows(void) {
    uint8_t temp;

    // Row 1.
    temp           = (*state)[3][1];
    (*state)[3][1] = (*state)[2][1];
    (*state)[2][1] = (*state)[1][1];
    (*state)[1][1] = (*state)[0][1];
    (*state)[0][1] = temp;

    // Row 2.
    temp           = (*state)[0][2];
    (*state)[0][2] = (*state)[2][2];
    (*state)[2][2] = temp;

    temp           = (*state)[1][2];
    (*state)[1][2] = (*state)[3][2];
    (*state)[3][2] = temp;

    // Row 3.
    temp           = (*state)[0][3];
    (*state)[0][3] = (*state)[1][3];
    (*state)[1][3] = (*state)[2][3];
    (*state)[2][3] = (*state)[3][3];
    (*state)[3][3] = temp;
}


// Apply the inverse S-box to the bytes in the state.
static void inv_sub_bytes(void) {
    for (size_t c = 0; c < 4; ++c) {
        for (size_t r = 0; r < 4; ++r) {
            (*state)[c][r] = inv_sbox[(*state)[c][r]];
        }
    }
}


// Similar to mix_columns but using the polynomial 0x0B X^3 + 0x0D X^2 + 0x09 X + 0x0E.
static void inv_mix_columns(void) {
    uint8_t temp[4];
    for (size_t c = 0; c < 4; ++c) {
        for (size_t r = 0; r < 4; ++r) {
            temp[r] = (*state)[c][r];
        }
        (*state)[c][0] = gf256_mul(temp[0], 0x0E) ^ gf256_mul(temp[1], 0x0B) ^ gf256_mul(temp[2], 0x0D) ^ gf256_mul(temp[3], 0x09);
        (*state)[c][1] = gf256_mul(temp[0], 0x09) ^ gf256_mul(temp[1], 0x0E) ^ gf256_mul(temp[2], 0x0B) ^ gf256_mul(temp[3], 0x0D);
        (*state)[c][2] = gf256_mul(temp[0], 0x0D) ^ gf256_mul(temp[1], 0x09) ^ gf256_mul(temp[2], 0x0E) ^ gf256_mul(temp[3], 0x0B);
        (*state)[c][3] = gf256_mul(temp[0], 0x0B) ^ gf256_mul(temp[1], 0x0D) ^ gf256_mul(temp[2], 0x09) ^ gf256_mul(temp[3], 0x0E);
    }
}


// Encrypt a block.
void aes128_encrypt(uint8_t* block) {
    state = (uint8_t (*)[4][4])block;

    add_round_key(0);

    for (size_t round = 1; round < 10; ++round) {
        sub_bytes();
        shift_rows();
        mix_columns();
        add_round_key(round);
    }

    sub_bytes();
    shift_rows();
    add_round_key(10);
}


// Decrypt a block.
void aes128_decrypt(uint8_t* block) {
    state = (uint8_t (*)[4][4])block;

    add_round_key(10);

    for (size_t round = 10; round-- > 1;) {
        inv_shift_rows();
        inv_sub_bytes();
        add_round_key(round);
        inv_mix_columns();
    }

    inv_shift_rows();
    inv_sub_bytes();
    add_round_key(0);
}

That's the basic block cipher done, but it isn't too useful as all we can do is encrypt 128 bits of data, let's remedy this.

Cipher block chaining

In cipher block chaining each plaintext block is XORed with the previous ciphertext block, the first block has no previous block so we also supply an initialisation vector.

See these diagrams from Wikipedia.

{ aes128_cbc.h }

#if !defined(__AES128_CBC_H__)
#define __AES128_CBC_H__

#include "aes128.h"

void aes128_cbc_init(uint8_t* key, uint8_t* init_vec);
void aes128_cbc_encrypt(uint8_t* data, size_t len);
void aes128_cbc_decrypt(uint8_t* data, size_t len);

#endif // __AES128_CBC_H__

{ aes128_cbc.c }

#include <string.h>

#include "aes128_cbc.h"

static uint8_t* iv;


void aes128_cbc_init(uint8_t* key, uint8_t* init_vec) {
    aes128_init(key);
    iv = init_vec;
}


static void xor_equals(uint8_t* lblock, uint8_t* rblock) {
    for (size_t i = 0; i < 16; i++) {
        lblock[i] ^= rblock[i];
    }
}


void aes128_cbc_encrypt(uint8_t* data, size_t len) {
    xor_equals(&data[0], &iv[0]);
    aes128_encrypt(&data[0]);
    for (size_t blk_idx = 1; blk_idx < len/16; blk_idx++) {
        xor_equals(&data[blk_idx*16], &data[(blk_idx - 1)*16]);
        aes128_encrypt(&data[blk_idx*16]);
    }
}


void aes128_cbc_decrypt(uint8_t* data, size_t len) {
    uint8_t cipher_text_buff[16];
    uint8_t xor_buff[16];

    memcpy(&cipher_text_buff[0], &data[0], 16*sizeof(uint8_t));

    aes128_decrypt(&data[0]);
    xor_equals(&data[0], &iv[0]);
    for (size_t blk_idx = 1; blk_idx < len/16; blk_idx++) {
        memcpy(&xor_buff[0], &cipher_text_buff[0], 16*sizeof(uint8_t));
        memcpy(&cipher_text_buff[0], &data[blk_idx*16], 16*sizeof(uint8_t));
        aes128_decrypt(&data[blk_idx*16]);
        xor_equals(&data[blk_idx*16], &xor_buff[0]);
    }
}

Conclusion

I hope you enjoyed reading this, the code can be found here, have fun, an exercise might be to read up on some modes of operation and implement them.