rand5()
Developer An wants to get a random integer from the range of $[1, 5]$
. Here comes
one of the most common mistakes in the history of programming:
random_number = random() % 5 + 1;
The problem is that: In most programming languages (such as C), the random()
function will return a random number from the range $ [0, RAND\_MAX) $
. Unless
5 divides RAND_MAX
, the outcomes are not equally distributed.
Why’s that the case? For the sake of simplicity, let’s just assume that RAND_MAX
is 7, which implies the random()
function will return 0, 1, 2, 3, 4, 5, 6
each with a probability of $ \frac{1}{7} $
.
Thus, one can obtain the table:
random_number |
required outcomes of random() |
probability |
---|---|---|
1 | 0, 6 | 2⁄7 |
2 | 1, 5 | 2⁄7 |
3 | 2 | 1⁄7 |
4 | 3 | 1⁄7 |
5 | 4 | 1⁄7 |
So how do we guarantee the desired probability distribution? Well, the solution
is fairly simple: While the outcome of random()
is greater than or equal to 5,
just sample it again.
int rand5() {
int x;
do {
x = rand();
} while(x >= 5);
return x + 1;
}
Let’s see what the probability of having 1
as the outcome of rand5()
is.
Defines $ E_n $
to be the event of getting a 1
after $ n $
samples of random()
without breaking the loop. Our desired event $ rand5() = 1 $
, $ E $
, can be broken down into the following mutually exclusive events:
Event
$ E_1 $
: We get1
after 1 sample. This event is equivalent to the event of$ rand() = 0 $
at the first sample. Thus,$ P(E_1) = \frac{1}{7} $
.Event
$E_2$
: We get1
after 2 samples without breaking the inner loop. This event is equivalent to the event of having a resample in the first time (ie.$ rand() \in \{5, 6\} $
at the first sample), and$rand() = 0$
at the second sample. The probability is$ \frac{2}{7} \cdot \frac{1}{7}$
.The event
$ E_n $
of getting a1
after n samples, without breaking the inner loop, will have the probability of$ {\frac{2}{7}}^{n-1} \cdot \frac{1}{7} $
.
Since all the above events are mutually exclusive, it holds that:
$$ P(E) = \sum _n^{\infty} P(E_i) = P(E_1) + P(E_2) + P(E_3) + ... P(E_n) + ... $$
$$
= \frac{1}{7} + \frac{1}{7} \cdot \frac{2}{7}
+ \frac{1}{7} \cdot {\frac{2}{7}}^{2}
+ \frac{1}{7} \cdot {\frac{2}{7}}^{3}
+ ... \frac{1}{7} \cdot {\frac{2}{7}}^{n-1} + ...
$$
$$
= \frac{1}{7} \cdot (1 + \frac{2}{7} + {\frac{2}{7}}^2 + {\frac{2}{7}}^3 + ...)
= \frac{1}{7} \cdot \frac{1}{1 - \frac{2}{7}}
= \frac{1}{7} \cdot \frac{7}{5}
= \frac{1}{5}
$$
We can apply the above methodology to prove $ P(rand5() = x) = 1/5 $
for any $ x \in \{2, 3, 4, 5\} $
.
rand7()
Given that we have a function rand5()
that will return an integer from the range
$ [0, 4] $
with equal probability. Based on this function, how can one construct
a function rand7()
that will return an integer from the range $ [1, 7] $
with
equal probability?
Here’s my approach:
int rand7() {
int d1, d2, x;
do {
d1 = rand5();
d2 = rand5();
x = 5 * d1 + d2;
} while(x >= 21);
return x % 7 + 1;
}
Our very goal is to have the integer x
to be equally distributed over the
smallest range posible that is greater than $ [0, 6] $
, than we can “cut” it out
and re-map that range into the desired range. A conventional choice is to choose
the range $ [0, 24] $
, as one can easily sample all possible 5-ary 2-digits
numbers with equal probability by using only two rand5()
function calls.
Note that we can map the range $ [0, 20] $
to $ [0, 6] $
by the mod
operator.
Now we can apply the same trick as before: Since x
is equally distributed
over $ [0, 24] $
inside the do(...) while();
loop, then by the same methodology
above, one can prove that $ P(rand5() = x) = \frac{1}{7} $
for any $ x \in \{1, 2, ..., 7\} $
.