# rand5()

Developer An wants to get a random integer from the range of `$[1, 5]$`

. Here comes
one of the most common mistakes in the history of programming:

```
random_number = random() % 5 + 1;
```

The problem is that: In most programming languages (such as C), the `random()`

function will return a random number from the range `$ [0, RAND\_MAX) $`

. Unless
5 divides `RAND_MAX`

, the outcomes are not equally distributed.

Why’s that the case? For the sake of simplicity, let’s just assume that `RAND_MAX`

is 7, which implies the `random()`

function will return 0, 1, 2, 3, 4, 5, 6
each with a probability of `$ \frac{1}{7} $`

.

Thus, one can obtain the table:

`random_number` |
required outcomes of `random()` |
probability |
---|---|---|

1 | 0, 6 | ^{2}⁄_{7} |

2 | 1, 5 | ^{2}⁄_{7} |

3 | 2 | ^{1}⁄_{7} |

4 | 3 | ^{1}⁄_{7} |

5 | 4 | ^{1}⁄_{7} |

So how do we guarantee the desired probability distribution? Well, the solution
is fairly simple: While the outcome of `random()`

is greater than or equal to 5,
just sample it again.

```
int rand5() {
int x;
do {
x = rand();
} while(x >= 5);
return x + 1;
}
```

Let’s see what the probability of having `1`

as the outcome of `rand5()`

is.

Defines `$ E_n $`

to be the event of getting a `1`

after `$ n $`

samples of `random()`

without breaking the loop. Our desired event `$ rand5() = 1 $`

, `$ E $`

, can be broken down into the following mutually exclusive events:

Event

`$ E_1 $`

: We get`1`

after 1 sample. This event is equivalent to the event of`$ rand() = 0 $`

at the first sample. Thus,`$ P(E_1) = \frac{1}{7} $`

.Event

`$E_2$`

: We get`1`

after 2 samples without breaking the inner loop. This event is equivalent to the event of having a resample in the first time (ie.`$ rand() \in \{5, 6\} $`

at the first sample), and`$rand() = 0$`

at the second sample. The probability is`$ \frac{2}{7} \cdot \frac{1}{7}$`

.The event

`$ E_n $`

of getting a`1`

after n samples, without breaking the inner loop, will have the probability of`$ {\frac{2}{7}}^{n-1} \cdot \frac{1}{7} $`

.

Since all the above events are mutually exclusive, it holds that:

`$$ P(E) = \sum _n^{\infty} P(E_i) = P(E_1) + P(E_2) + P(E_3) + ... P(E_n) + ... $$`

```
$$
= \frac{1}{7} + \frac{1}{7} \cdot \frac{2}{7}
+ \frac{1}{7} \cdot {\frac{2}{7}}^{2}
+ \frac{1}{7} \cdot {\frac{2}{7}}^{3}
+ ... \frac{1}{7} \cdot {\frac{2}{7}}^{n-1} + ...
$$
```

```
$$
= \frac{1}{7} \cdot (1 + \frac{2}{7} + {\frac{2}{7}}^2 + {\frac{2}{7}}^3 + ...)
= \frac{1}{7} \cdot \frac{1}{1 - \frac{2}{7}}
= \frac{1}{7} \cdot \frac{7}{5}
= \frac{1}{5}
$$
```

We can apply the above methodology to prove `$ P(rand5() = x) = 1/5 $`

for any `$ x \in \{2, 3, 4, 5\} $`

.

# rand7()

Given that we have a function `rand5()`

that will return an integer from the range
`$ [0, 4] $`

with equal probability. Based on this function, how can one construct
a function `rand7()`

that will return an integer from the range `$ [1, 7] $`

with
equal probability?

Here’s my approach:

```
int rand7() {
int d1, d2, x;
do {
d1 = rand5();
d2 = rand5();
x = 5 * d1 + d2;
} while(x >= 21);
return x % 7 + 1;
}
```

Our very goal is to have the integer `x`

to be equally distributed over the
smallest range posible that is greater than `$ [0, 6] $`

, than we can “cut” it out
and re-map that range into the desired range. A conventional choice is to choose
the range `$ [0, 24] $`

, as one can easily sample all possible 5-ary 2-digits
numbers with equal probability by using only two `rand5()`

function calls.

Note that we can map the range `$ [0, 20] $`

to `$ [0, 6] $`

by the `mod`

operator.
Now we can apply the same trick as before: Since `x`

is equally distributed
over `$ [0, 24] $`

inside the `do(...) while();`

loop, then by the same methodology
above, one can prove that `$ P(rand5() = x) = \frac{1}{7} $`

for any `$ x \in \{1, 2, ..., 7\} $`

.