# rand5()

Developer An wants to get a random integer from the range of $[1, 5]$. Here comes one of the most common mistakes in the history of programming:

random_number = random() % 5 + 1;


The problem is that: In most programming languages (such as C), the random() function will return a random number from the range $[0, RAND\_MAX)$. Unless 5 divides RAND_MAX, the outcomes are not equally distributed.

Why’s that the case? For the sake of simplicity, let’s just assume that RAND_MAX is 7, which implies the random() function will return 0, 1, 2, 3, 4, 5, 6 each with a probability of $\frac{1}{7}$.

Thus, one can obtain the table:

random_number required outcomes of random() probability
1 0, 6 27
2 1, 5 27
3 2 17
4 3 17
5 4 17

So how do we guarantee the desired probability distribution? Well, the solution is fairly simple: While the outcome of random() is greater than or equal to 5, just sample it again.

int rand5() {
int x;
do {
x = rand();
} while(x >= 5);

return x + 1;
}


Let’s see what the probability of having 1 as the outcome of rand5() is.

Defines $E_n$ to be the event of getting a 1 after $n$ samples of random() without breaking the loop. Our desired event $rand5() = 1$, $E$, can be broken down into the following mutually exclusive events:

• Event $E_1$: We get 1 after 1 sample. This event is equivalent to the event of $rand() = 0$ at the first sample. Thus, $P(E_1) = \frac{1}{7}$.

• Event $E_2$: We get 1 after 2 samples without breaking the inner loop. This event is equivalent to the event of having a resample in the first time (ie. $rand() \in \{5, 6\}$ at the first sample), and $rand() = 0$ at the second sample. The probability is $\frac{2}{7} \cdot \frac{1}{7}$.

• The event $E_n$ of getting a 1 after n samples, without breaking the inner loop, will have the probability of ${\frac{2}{7}}^{n-1} \cdot \frac{1}{7}$.

Since all the above events are mutually exclusive, it holds that:

$$P(E) = \sum _n^{\infty} P(E_i) = P(E_1) + P(E_2) + P(E_3) + ... P(E_n) + ...$$

$$= \frac{1}{7} + \frac{1}{7} \cdot \frac{2}{7} + \frac{1}{7} \cdot {\frac{2}{7}}^{2} + \frac{1}{7} \cdot {\frac{2}{7}}^{3} + ... \frac{1}{7} \cdot {\frac{2}{7}}^{n-1} + ...$$

$$= \frac{1}{7} \cdot (1 + \frac{2}{7} + {\frac{2}{7}}^2 + {\frac{2}{7}}^3 + ...) = \frac{1}{7} \cdot \frac{1}{1 - \frac{2}{7}} = \frac{1}{7} \cdot \frac{7}{5} = \frac{1}{5}$$

We can apply the above methodology to prove $P(rand5() = x) = 1/5$ for any $x \in \{2, 3, 4, 5\}$.

# rand7()

Given that we have a function rand5() that will return an integer from the range $[0, 4]$ with equal probability. Based on this function, how can one construct a function rand7() that will return an integer from the range $[1, 7]$ with equal probability?

Here’s my approach:

int rand7() {
int d1, d2, x;
do {
d1 = rand5();
d2 = rand5();
x = 5 * d1 + d2;
} while(x >= 21);

return x % 7 + 1;
}


Our very goal is to have the integer x to be equally distributed over the smallest range posible that is greater than $[0, 6]$, than we can “cut” it out and re-map that range into the desired range. A conventional choice is to choose the range $[0, 24]$, as one can easily sample all possible 5-ary 2-digits numbers with equal probability by using only two rand5() function calls.

Note that we can map the range $[0, 20]$ to $[0, 6]$ by the mod operator. Now we can apply the same trick as before: Since x is equally distributed over $[0, 24]$ inside the do(...) while(); loop, then by the same methodology above, one can prove that $P(rand5() = x) = \frac{1}{7}$ for any $x \in \{1, 2, ..., 7\}$.