random()

2018/06/10

rand5()

Developer An wants to get a random integer from the range of $[1, 5]$. Here comes one of the most common mistakes in the history of programming:

random_number = random() % 5 + 1;

The problem is that: In most programming languages (such as C), the random() function will return a random number from the range $ [0, RAND\_MAX) $. Unless 5 divides RAND_MAX, the outcomes are not equally distributed.

Why’s that the case? For the sake of simplicity, let’s just assume that RAND_MAX is 7, which implies the random() function will return 0, 1, 2, 3, 4, 5, 6 each with a probability of $ \frac{1}{7} $.

Thus, one can obtain the table:

random_number required outcomes of random() probability
1 0, 6 27
2 1, 5 27
3 2 17
4 3 17
5 4 17

So how do we guarantee the desired probability distribution? Well, the solution is fairly simple: While the outcome of random() is greater than or equal to 5, just sample it again.

int rand5() {
    int x;
    do {
        x = rand();
    } while(x >= 5);
    
    return x + 1;
}

Let’s see what the probability of having 1 as the outcome of rand5() is.

Defines $ E_n $ to be the event of getting a 1 after $ n $ samples of random() without breaking the loop. Our desired event $ rand5() = 1 $, $ E $, can be broken down into the following mutually exclusive events:

Since all the above events are mutually exclusive, it holds that:

$$ P(E) = \sum _n^{\infty} P(E_i) = P(E_1) + P(E_2) + P(E_3) + ... P(E_n) + ... $$

$$ = \frac{1}{7} + \frac{1}{7} \cdot \frac{2}{7} + \frac{1}{7} \cdot {\frac{2}{7}}^{2} + \frac{1}{7} \cdot {\frac{2}{7}}^{3} + ... \frac{1}{7} \cdot {\frac{2}{7}}^{n-1} + ... $$

$$ = \frac{1}{7} \cdot (1 + \frac{2}{7} + {\frac{2}{7}}^2 + {\frac{2}{7}}^3 + ...) = \frac{1}{7} \cdot \frac{1}{1 - \frac{2}{7}} = \frac{1}{7} \cdot \frac{7}{5} = \frac{1}{5} $$

We can apply the above methodology to prove $ P(rand5() = x) = 1/5 $ for any $ x \in \{2, 3, 4, 5\} $.

rand7()

Given that we have a function rand5() that will return an integer from the range $ [0, 4] $ with equal probability. Based on this function, how can one construct a function rand7() that will return an integer from the range $ [1, 7] $ with equal probability?

Here’s my approach:

int rand7() {
    int d1, d2, x;
    do {
        d1 = rand5();
        d2 = rand5();
        x = 5 * d1 + d2;
    } while(x >= 21);
    
    return x % 7 + 1;
}

Our very goal is to have the integer x to be equally distributed over the smallest range posible that is greater than $ [0, 6] $, than we can “cut” it out and re-map that range into the desired range. A conventional choice is to choose the range $ [0, 24] $, as one can easily sample all possible 5-ary 2-digits numbers with equal probability by using only two rand5() function calls.

Note that we can map the range $ [0, 20] $ to $ [0, 6] $ by the mod operator. Now we can apply the same trick as before: Since x is equally distributed over $ [0, 24] $ inside the do(...) while(); loop, then by the same methodology above, one can prove that $ P(rand5() = x) = \frac{1}{7} $ for any $ x \in \{1, 2, ..., 7\} $.