rand5()

Developer An wants to get a random integer from the range of $[1, 5]$ . Here comes one of the most common mistakes in the history of programming:

random_number = random() % 5 + 1;

The problem is that: In most programming languages (such as C), the random() function will return a random number from the range $ [0, RAND\_MAX) $ . Unless 5 divides RAND_MAX, the outcomes are not equally distributed.

Why’s that the case? For the sake of simplicity, let’s just assume that RAND_MAX is 7, which implies the random() function will return 0, 1, 2, 3, 4, 5, 6 each with a probability of $ \frac{1}{7} $ .

Thus, one can obtain the table:

`random_number`	required outcomes of `random()`	probability
1	0, 6	²⁄₇
2	1, 5	²⁄₇
3	2	¹⁄₇
4	3	¹⁄₇
5	4	¹⁄₇

So how do we guarantee the desired probability distribution? Well, the solution is fairly simple: While the outcome of random() is greater than or equal to 5, just sample it again.

int rand5() {
    int x;
    do {
        x = rand();
    } while(x >= 5);
    
    return x + 1;
}

Let’s see what the probability of having 1 as the outcome of rand5() is.

Defines $ E_n $ to be the event of getting a 1 after $ n $ samples of random() without breaking the loop. Our desired event $ rand5() = 1 $ , $ E $ , can be broken down into the following mutually exclusive events:

Event $ E_1 $ : We get 1 after 1 sample. This event is equivalent to the event of $ rand() = 0 $ at the first sample. Thus, $ P(E_1) = \frac{1}{7} $ .
Event $E_2$ : We get 1 after 2 samples without breaking the inner loop. This event is equivalent to the event of having a resample in the first time (ie. $ rand() \in \{5, 6\} $ at the first sample), and $rand() = 0$ at the second sample. The probability is $ \frac{2}{7} \cdot \frac{1}{7}$ .
The event $ E_n $ of getting a 1 after n samples, without breaking the inner loop, will have the probability of $ {\frac{2}{7}}^{n-1} \cdot \frac{1}{7} $ .

Since all the above events are mutually exclusive, it holds that:

$$ P(E) = \sum _n^{\infty} P(E_i) = P(E_1) + P(E_2) + P(E_3) + ... P(E_n) + ... $$

$$ = \frac{1}{7} + \frac{1}{7} \cdot \frac{2}{7} + \frac{1}{7} \cdot {\frac{2}{7}}^{2} + \frac{1}{7} \cdot {\frac{2}{7}}^{3} + ... \frac{1}{7} \cdot {\frac{2}{7}}^{n-1} + ... $$

$$ = \frac{1}{7} \cdot (1 + \frac{2}{7} + {\frac{2}{7}}^2 + {\frac{2}{7}}^3 + ...) = \frac{1}{7} \cdot \frac{1}{1 - \frac{2}{7}} = \frac{1}{7} \cdot \frac{7}{5} = \frac{1}{5} $$

We can apply the above methodology to prove $ P(rand5() = x) = 1/5 $ for any $ x \in \{2, 3, 4, 5\} $ .

rand7()

Given that we have a function rand5() that will return an integer from the range $ [0, 4] $ with equal probability. Based on this function, how can one construct a function rand7() that will return an integer from the range $ [1, 7] $ with equal probability?

Here’s my approach:

int rand7() {
    int d1, d2, x;
    do {
        d1 = rand5();
        d2 = rand5();
        x = 5 * d1 + d2;
    } while(x >= 21);
    
    return x % 7 + 1;
}

Our very goal is to have the integer x to be equally distributed over the smallest range posible that is greater than $ [0, 6] $ , than we can “cut” it out and re-map that range into the desired range. A conventional choice is to choose the range $ [0, 24] $ , as one can easily sample all possible 5-ary 2-digits numbers with equal probability by using only two rand5() function calls.

Note that we can map the range $ [0, 20] $ to $ [0, 6] $ by the mod operator. Now we can apply the same trick as before: Since x is equally distributed over $ [0, 24] $ inside the do(...) while(); loop, then by the same methodology above, one can prove that $ P(rand5() = x) = \frac{1}{7} $ for any $ x \in \{1, 2, ..., 7\} $ .

random()

2018/06/10

rand5()

rand7()