`(This was originally on my old website, I have resurrected it here.)`

After reading Internet Gambling Exploit I started thinking about card shuffling, and how I'd do it, we explore this seemingly simple problem and discover that implementation details can violate the assumptions of a mathematically correct algorithm.

This post was written in haste, so there are probably things that should be added to it / could be better written.

We want an algorithm to generate a permutation uniformly at random, that is all permutations are equiprobable.

We use standard interval notation \( [,] \) for inclusive of the endpoints and \( (,) \) for exclusive endpoints.

Many languages have a function \( f \) that samples a discrete \( \text{Unif}[0, N] \) distribution for some fixed \( N \), e.g. C has \( f = \text{rand} \) with \( N = \text{RAND_MAX} \). S'pose we need to sample a discrete \( \text{Unif}[0, N] \) distribution, where \( n < N \), we shouldn't use the residues modulo \( n + 1 \), as these are not uniform.

For example \( N = 10 \), \( n = 2 \),

- \( 0, 3, 6, 9\;\mod\;3 = 0\;\;\mathbb{P}(0) = 4/11 \)
- \( 1, 4, 7, 10\;\mod\;3 = 1\;\;\mathbb{P}(1) = 4/11 \)
- \( 2, 5, 8\;\mod\;3 = 2\;\;\mathbb{P}(2) = 3/11 \)

Note: don't rely on rand for good quality numbers, many implementations use a linear congruential generator with badly chosen constants.

I default to using a CSPRNG unless I have good reason not to, speed is no longer an issue, libsodium has bindings for most languages.

We'll get the algorithm most people first come up with out of the way.

```
incorrect(a) {
for i in (n - 1)..1 {
j <- unif[0, n - 1]
swap(a[i], a[j])
}
}
```

This is non-uniform because there will be \( n^n \) swaps but there are \( n! \) permutations and \( n^n \) is not divisible by \( n! \), since \( n - 1 \) divides \( n! \) but not \( n^n \). So some permutations are the result of more of the \( n^n \) swap sequences than others.

This shuffle is uniform.

```
fisher_yates(a) {
for i in (n - 1)..1 {
j <- unif[0, i]
swap(a[i], a[j])
}
}
```

This is uniform because

- there are n! distinct sequences of random numbers that can be generated during the shuffle
- each of these sequences is equiprobable
- each generates a distinct permutation
- there are n! different permutations

We run into a problem when we come to implement this, most built-in PRNGs are seeded with a 32 bit value, this means they produce at most \( 2^{32} \) different random number streams, so we can get at most \( 2^{32} \) different permutations. However, there are \( 52! ≈ 2^{225.58\ldots} \) different permutations so we can only reach \( 2^{32}/52! ≈ 2^{-193.58\ldots} \) of the total permutations. So an implementation would be uniform over the reachable permutations.

We can (potentially) reach more of the permutations by using an offset into the PRNG stream before generating a permutation, suppose \( \text{offset} \gets \text{unif}[0, 2^{32}) \), then we can reach, at best, a factor \( 2^{32} \) more of the permutations, still only \( 2^{-161.58...} \) of the total.

This doesn't really matter if we are only using a small proportion of the permutations we can generate. But for an online poker system with millions of players, we would quickly exhaust these permutations.

A valid solution would be to go to use a PRNG with a 64 bit seed, but let's see if we can come up with a scheme such that all permutations are reachable, we need a PRNG with a least a 226 bit seed.

There is an interesting enumeration of permutations in lexicographic order, if we use factorial base numbers.

Lexicographic order is e.g. for \( n = 3 \)

- \( 0 \to [0, 1, 2] \)
- \( 1 \to [0, 2, 1] \)
- \( 2 \to [1, 0, 2] \)
- \( 3 \to [1, 2, 0] \)
- \( 4 \to [2, 0, 1] \)
- \( 5 \to [2, 1, 0] \)

Naturals can be uniquely expressed as

$$ n = (a_k\cdots a_1)_! = \sum_{i = 1}^{k}a_i (i - 1)! $$

with \( 0 \leq a_i < i \), a number in this form is called a factoriadic.

Example \( 341010_! = 463_{10} \).

To get the \( k \)th permutation of \( n \) items, we

- write \( k \) as a factoriadic
- leftpad, ;), with zeros so \( n \) total digits
- the factoriadic digits give the indices to pop

Example, \( n = 4 \), \( k = 4 \)

- \( a = [0, 1, 2, 3] \)
- \( k = 200_! \)
- \( k = 0200_! \)
- \( a' = [0, 3, 1, 2] \)

We can get a CSPRNG by using a block cipher in CTR mode, encrypting successive values of a counter.

The birthday paradox in this context, since block ciphers are pseudo-random permutations, for an \( n \) bit block size, the first \( 2^n \) output blocks will be unique. However for random data \( \mathbb{P}(\text{collision}) = 1/2 \) after approx. \( 2^{n/2} \) blocks.

For a block size of 128 bits the birthday bound is \( 16\;\text{B} * 2^{64} = 256\;\text{EiB} \), similarly for 64 bit blocks the bound is \( 32\;\text{GiB} \).

For this reason if you ever use a block cipher with a small block size, it is not safe to use the same key to encrypt too many blocks.

Even though a 128 bit block size affords us a huge amount of pseudo-random data, it is still beneficial to change keys after a certain amount of data, e.g. for mitigating the damage caused by a compromised key.

We can use AES256 as our block cipher, there are enough key choices to make all permutations possible.

We can use /dev/urandom to generate our key, infact some /dev/random implementations use a Fortuna PRNG, of which a block cipher scheme like this is a part of.

I just realised I made a mistake at the end, the key cannont be our seed because since the keys index the permutations of 128 bits of data, so there are many redundant keys. However we can fix this by spliting the seed accross the key and initial cipher state.

Also doing some research I came accross this pcg-random.org, also see this reddit comment.