Notes / Reversing CRC and Forcing Collisions

We'll take a look at Cyclic Redundacny Checks in general, how to reverse them and force collisions, and in particular look at CRC32. Accompanying code is here. I've had to force collisions for Ethernet frames before, so these notes are for me too!

What is a CRC?

Cyclic redundancy checks are a type of error correcting code, an example use is detecting if any errors occurred during the transmission of data. They are not cryptographically secure hash functions because they are easily reversible, they are based on polynomial division.

Notation/background

A field is a set with a notion of addition and multiplication, along with some rules about how those operations behave. A familiar example is the real numbers along with regular addition and multiplication.

A finite field is a field with finitely many elements, it turns out that all of these fields have a prime power number of elements. They are also called Galois fields and the notation $ \text{GF}(p^n) $ is common. For example a representation of $ \text{GF}(p) $ is integer arithmetic modulo $ p $. Another example is $ \text{GF}(256) $, Rijndael's field from AES. A very useful representation of $ \text{GF}(2) $ is bits with addition as XOR and multiplication as AND.

A ring is also a set with a notion of addition and multiplication, however the rules to be a ring are slightly weaker than those to be a field. All rings are fields. Polynomial rings are a common type of ring, they are written as $ F[X] $, where $ F $ is the field the coefficients come from, the arithmetic is polynomial arithmetic. The degree $ \text{deg}\;p $ of polynomial $ p $ is its highest power of $ X $.

Data can be interpreted as a polynomial in this way $$ (abc \cdots d)_2 \leftrightarrow aX^n + bX^{n - 1} + cX^{n - 2} + \cdots + d $$

Calculation

Let $ m, g \in \text{GF}(2)[X] $, $ m $ is our message/data, and $ g $ is a special polynomial called the generator. Then the CRC code is $$ \text{crc}_g(m) = m \cdot X^{\text{deg}\;g}\;(\text{mod}\;g) $$ Note that multiplying by $ X^{\text{deg}\;g} $ is essentially padding the message with $ \text{deg}\;g $ zeros. Also $ \text{crc}_g(m) $ has $ \text{deg}\;g - 1 $ bits.

Example

Let's look at a worked example with $$ m = 1001100 = X^6 + X^3 + X^2 $$ $$ g = 1101 = X^3 + X^2 + 1 $$

Then $$ m \cdot X^3 = 1001100000 $$

To work out the residue mod $ g $, we use long division, esentially moving the generator along and XORing to clear any bits.

$$ \require{enclose} \begin{array}{rl} 1111101\phantom{000} & \\[-3pt] 1101 \enclose{longdiv}{1001100000}\kern-.2ex \\[-3pt] \underline{1101\phantom{000000}} & \\[-3pt] 100100000 & \\[-3pt] \underline{1101\phantom{00000}} & \\[-3pt] 10000000 & \\[-3pt] \underline{1101\phantom{0000}} & \\[-3pt] 1010000 & \\[-3pt] \underline{1101\phantom{000}} & \\[-3pt] 111000 & \\[-3pt] \underline{1101\phantom{00}} & \\[-3pt] 1100 & \\[-3pt] \underline{1101} & \\[-3pt] 1 \end{array} $$

We can read off the remainder and conculde that in this case $ \text{crc}_g(m) = 001 $. The value of the quotient can be discarded since we only care about the remainder.

CRC32

Let's look at the particular CRC used to calulate the frame check sequence in an Ethernet packet. It uses an algorithm with some extra steps, but the core is still polynomial division.

The spec can be viewed here, below are the relevant sections.

3.2.9 Frame Check Sequence (FCS) field

A cyclic redundancy check (CRC) is used by the transmit and receive algorithms to generate a CRC value
for the FCS field. The FCS field contains a 4-octet (32-bit) CRC value. This value is computed as a function
of the contents of the protected fields of the MAC frame: the Destination Address, Source Address, Length/
Type field, MAC Client Data, and Pad (that is, all fields except FCS). The encoding is defined by the
following generating polynomial.

G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1

Mathematically, the CRC value corresponding to a given MAC frame is defined by the following procedure:

a) The first 32 bits of the frame are complemented.
b) The n bits of the protected fields are then considered to be the coefficients of a polynomial M(x) of
   degree n – 1. (The first bit of the Destination Address field corresponds to the x(n–1) term and the last
   bit of the MAC Client Data field (or Pad field if present) corresponds to the x0 term.)
c) M(x) is multiplied by x32 and divided by G(x), producing a remainder R(x) of degree ≤ 31.
d) The coefficients of R(x) are considered to be a 32-bit sequence.
e) The bit sequence is complemented and the result is the CRC.

The 32 bits of the CRC value are placed in the FCS field so that the x31 term is the left-most bit of the first
octet, and the x0 term is the right most bit of the last octet. (The bits of the CRC are thus transmitted in the
order x31, x30,…, x1, x0.) See Hammond, et al. [B36].

3.3 Order of bit transmission

Each octet of the MAC frame, with the exception of the FCS, is transmitted least significant bit first.

NOTE: I had to reverse the order of the bits as the final step to get my output to match that of other implementations.

The procedure is

Reverse the order of bits within each byte of the message.
Pad with 32 zero bits on the right.
Flip the first 32 bits.
Calulate the remainder when dividing by the polynomial 0x104C11DB7.
Flip the bits.
Reverse the order of the bits.

Implementation

GENERATOR = BitVector(intVal=0x104C11DB7)

def crc32(message: BitVector) -> BitVector:
    message = reverse_bits_in_bytes(message)
    message += hex_to_bv("00000000")
    message[0:32] ^= hex_to_bv("FFFFFFFF")
    quotient, remainder = message.gf_divide_by_modulus(GENERATOR, 32)
    remainder ^= hex_to_bv("FFFFFFFF")
    remainder = reverse_bits(remainder)

    return remainder

NOTE: The Python BitVector library is used to do the polynomial arithmetic we need.

We can test this out on a few values to see it agrees with other CRC32 implementations, 0x00, 0x03, "123456789".

$$ \text{crc32}(\text{0x00}) = \text{0xD202EF8D} $$

$$ \text{crc32}(\text{0x03}) = \text{0x4B0BBE37} $$

$$ \text{crc32}(\unicode{x201C}\text{123456789}\unicode{x201D}) = \text{0xCBF43926} $$

Reversing it

By reversing CRC we mean finding some function $ \text{crc}_g^{-1} $, such that $ \text{crc}_g(\text{crc}_g^{-1}(c)) = c $. (This doesn't mean that $ \text{crc}_g^{-1}(\text{crc}_g(m)) = m $ though.)

$$ \text{crc}_g(m) = m \cdot X^{\text{deg}\;g}\;(\text{mod}\;g) = c $$

$$ \text{crc}_g^{-1}(c) = c \cdot (X^{\text{deg}\;g})^{-1}\;(\text{mod}\;g) = m^\prime $$

$$ \text{crc}_g(m) = \text{crc}_g(m^\prime) = c $$

Implementation

Since CRC32 has some extra steps we reverse those too.

X_32_INVERSE = BitVector(intVal=0x100000000).gf_MI(GENERATOR, 32) # 0xCBF1ACDA

def crc32_inverse(code: BitVector) -> BitVector:
    code = reverse_bits(code)
    code ^= hex_to_bv("FFFFFFFF")
    message = code.gf_multiply_modular(X_32_INVERSE, GENERATOR, 32)
    message[0:32] ^= hex_to_bv("FFFFFFFF")
    message = reverse_bits_in_bytes(message)

    return message

Let's try it out on the CRCs we computed before.

$$ \text{crc32_inverse}(\text{0xD202EF8D}) = \text{0x7ED9D15C} $$

$$ \text{crc32}(\text{0x7ED9D15C}) = \text{0xD202EF8D} $$

$$ \text{crc32_inverse}(\text{0x4B0BBE37}) = \text{0x7ED9D15F} $$

$$ \text{crc32}(\text{0x7ED9D15F}) = \text{0x4B0BBE37} $$

$$ \text{crc32_inverse}(\text{0xCBF43926}) = \text{0x2A0DCDF2} $$

$$ \text{crc32}(\text{0x2A0DCDF2}) = \text{0xCBF43926} $$

Forcing collisions

For a given message $ m $ and target code $ t $, we want to find some padding bytes $ p $ such that $ \text{crc}_g(m\;||\;p) = t $, that is appending the padding to the message then calculating the CRC should give us the desired target code. ($ || $ is append, and $ \oplus $ is XOR.)

$$ \text{crc}_g(m\;||\;p) = t $$

$$ \text{crc}_g((m\;||\;\overbrace{00\cdots0}^{\text{deg}\;g})\;\oplus\;p) = t $$

$$ \text{crc}_g(m\;||\;00\cdots0)\;\oplus\;\text{crc}_g(p) = t $$

$$ \text{crc}_g(p) = t\;\oplus\;\text{crc}_g(m\;||\;00\cdots0) $$

$$ p = crc_g^{-1}(t\;\oplus\;\text{crc}_g(m\;||\;00\cdots0)) $$

Implementation

def magic_padding(message: BitVector, target_crc: BitVector) -> BitVector:
    return crc32_inverse(
        target_crc ^ crc32(message + hex_to_bv("00000000")) ^ hex_to_bv("FFFFFFFF")
    ) ^ hex_to_bv("FFFFFFFF")

NOTE: Need to invert the bits before and after applying crc32_inverse 🤷‍♂️, probably because the modifications in CRC32 mean it isn't actually linear.

$$ message = \unicode{x201C}\text{hello foo}\unicode{x201D} = \text{0x68656C6C6F20666F6F} $$

$$ target\_crc = \text{crc32}(\unicode{x201C}\text{hello bar}\unicode{x201D}) = \text{0x98776A67} $$

$$ padding = \text{0x4A478A26} $$

$$ \text{crc32}(message\;||\;padding) = \text{0x98776A67} $$

Conclusion

We've seen how to calculate, reverse, and force collisions for CRC32, a popular error detecting code.