Намалюйте цілі числа незалежно та рівномірно навмання від 1 до

18

Я хочу намалювати цілі числа від 1 до певного конкретного $N$ , прокрутивши деяку кількість справедливих шестигранних кісток (d6). Хороша відповідь пояснить, чому його метод виробляє єдині та незалежні цілі числа.

Як наочний приклад, було б корисно пояснити, як працює рішення для випадку $N=150$ .

Крім того, я хочу, щоб процедура була максимально ефективною: розгорніть найменшу кількість d6 в середньому для кожного створеного числа.

Переходи від сенарних до десяткових допустимі.

На це питання надихнула ця мета-нитка .

— Sycorax каже, що відновіть Моніку
джерело

12

Набір $\Omega(d,n)$ чітко визначених результатів у $n$ незалежних рулонах штампу з $d=6$ граней має $d^n$ елементів. Коли штамп справедливий, це означає, що кожен результат одного рулону має ймовірність $1/d$ а незалежність означає, що кожен з цих результатів матиме ймовірність $(1/d)^n:$ тобто вони мають рівномірний розподіл $\mathbb{P}_{d,n}.$

Припустимо, ви розробили певну процедуру $t$ яка визначає $m$ результат померлої сторони $c (=150)$ - тобто елемент $\Omega(c,m)$ інший повідомляє про помилку (а це означає, що вам доведеться повторити це для отримання результату). Тобто,

t : Ω (d, n) \to Ω (c, m) \cup {Failure} .

$t:\Omega(d,n)\to\Omega(c,m)\cup\{\text{Failure}\}.$

Нехай $F$ - ймовірність $t$ призводить до невдачі, і зауважимо, що $F$ є деяким інтегральним кратним $d^{-n},$ скажімо

F = Pr (t (ω) = Failure) = N F d - n .

$F = \Pr(t(\omega)=\text{Failure}) = N_F\, d^{-n}.$

(Для подальшої довідки зауважте, що очікувана кількість разів, коли потрібно запустити $t$ перш ніж не вийти з ладу, становить $1/(1-F).$ )

Вимога , щоб ці результати в $\Omega(c,m)$ бути однорідними і НЕ залежить умовний від $t$ не повідомляє , що значить не $t$ зберігає ймовірність в тому сенсі , що для кожної події $\mathcal{A}\subset\Omega(c,m),$

P d , n ( t * A ) 1 - F = P c, m (A) (1)

$\frac{\mathbb{P}_{d,n}\left(t^{*}\mathcal{A}\right)}{1-F}= \mathbb{P}_{c,m}\left(\mathcal{A}\right) \tag{1}$

де

t * (A) = {ω \in Ω ∣ t (ω) \in A}

$t^{*}\left(\mathcal A\right) = \{\omega\in\Omega\mid t(\omega)\in\mathcal{A}\}$

це набір штампів валків , що процедура $t$ привласнює події $\mathcal A.$

Consider an atomic event $\mathcal A = \{\eta\}\subset\Omega(c,m)$ , which must have probability $c^{-m}.$ Let $t^{*}\left(\mathcal A\right)$ (the dice rolls associated with $\eta$ ) have $N_\eta$ elements. $(1)$ becomes

N η d - n 1 - N F d - n = P d , n ( t * A ) 1 - F = P c, m (A) = c - m . (2)

$\frac{N_\eta d^{-n}}{1 - N_F d^{-n}} = \frac{\mathbb{P}_{d,n}\left(t^{*}\mathcal{A}\right)}{1-F}= \mathbb{P}_{c,m}\left(\mathcal{A}\right) = c^{-m}.\tag{2}$

It is immediate that the $N_\eta$ are all equal to some integer $N.$ It remains only to find the most efficient procedures $t.$ The expected number of non-failures per roll of the $c$ sided die is

1 m (1 - F) .

$\frac{1}{m}\left(1 - F\right).$

There are two immediate and obvious implications. One is that if we can keep $F$ small as $m$ grows large, then the effect of reporting a failure is asymptotically zero. The other is that for any given $m$ (the number of rolls of the $c$ -sided die to simulate), we want to make $F$ as small as possible.

Let's take a closer look at $(2)$ by clearing the denominators:

N c m = d n - N F > 0.

$N c^m = d^n - N_F \gt 0.$

This makes it obvious that in a given context (determined by $c,d,n,m$ ), $F$ is made as small as possible by making $d^n-N_F$ equal the largest multiple of $c^m$ that is less than or equal to $d^n.$ We may write this in terms of the greatest integer function (or "floor") $\lfloor*\rfloor$ as

N = ⌊ d n c m ⌋ .

$N = \lfloor \frac{d^n}{c^m} \rfloor.$

Finally, it is clear that $N$ ought to be as small as possible for highest efficiency, because it measures redundancy in $t$ . Specifically, the expected number of rolls of the $d$ -sided die needed to produce one roll of the $c$ -sided die is

N \times n m \times 1 1 - F .

$N \times \frac{n}{m} \times \frac{1}{1-F}.$

Thus, our search for high-efficiency procedures ought to focus on the cases where $d^n$ is equal to, or just barely greater than, some power $c^m.$

The analysis ends by showing that for given $d$ and $c,$ there is a sequence of multiples $(n,m)$ for which this approach approximates perfect efficiency. This amounts to finding $(n,m)$ for which $d^n/c^m \ge 1$ approaches $N=1$ in the limit (automatically guaranteeing $F\to 0$ ). One such sequence is obtained by taking $n=1,2,3,\ldots$ and determining

m = ⌊ n log d log c ⌋ . (3)

$m = \lfloor \frac{n\log d}{\log c} \rfloor.\tag{3}$

The proof is straightforward.

This all means that when we are willing to roll the original $d$ -sided die a sufficiently large number of times $n,$ we can expect to simulate nearly $\log d / \log c = \log_c d$ outcomes of a $c$ -sided die per roll. Equivalently,

It is possible to simulate a large number $m$ of independent rolls of a $c$ -sided die using a fair $d$ -sided die using an average of $\log(c)/\log(d) + \epsilon = \log_d(c) + \epsilon$ rolls per outcome where $\epsilon$ can be made arbitrarily small by choosing $m$ sufficiently large.

Examples and algorithms

In the question, $d=6$ and $c=150,$ whence

log d (c) = log ( c ) log ( d ) \approx 2.796489.

$\log_d(c) = \frac{\log(c)}{\log(d)} \approx 2.796489.$

Thus, the best possible procedure will require, on average, at least $2.796489$ rolls of a d6 to simulate each d150 outcome.

The analysis shows how to do this. We don't need to resort to number theory to carry it out: we can just tabulate the powers $d^n=6^n$ and the powers $c^m=150^m$ and compare them to find where $c^m \le d^n$ are close. This brute force calculation gives $(n,m)$ pairs

(n, m) \in {(3, 1), (14, 5), \dots}

$(n,m) \in \{(3,1), (14,5), \ldots\}$

for instance, corresponding to the numbers

(6 n, 150 m) \in {(216, 150), (78364164096, 75937500000), \dots} .

$(6^n, 150^m) \in \{(216,150), (78364164096,75937500000), \ldots\}.$

In the first case $t$ would associate $216-150=66$ of the outcomes of three rolls of a d6 to Failure and the other $150$ outcomes would each be associated with a single outcome of a d150.

In the second case $t$ would associate $78364164096-75937500000$ of the outcomes of 14 rolls of a d6 to Failure -- about 3.1% of them all -- and otherwise would output a sequence of 5 outcomes of a d150.

A simple algorithm to implement $t$ labels the faces of the $d$ -sided die with the numerals $0,1,\ldots, d-1$ and the faces of the $c$ -sided die with the numerals $0,1,\ldots, c-1.$ The $n$ rolls of the first die are interpreted as an $n$ -digit number in base $d.$ This is converted to a number in base $c.$ If it has at most $m$ digits, the sequence of the last $m$ digits is the output. Otherwise, $t$ returns Failure by invoking itself recursively.

For much longer sequences, you can find suitable pairs $(n,m)$ by considering every other convergent $n/m$ of the continued fraction expansion of $x=\log(c)/\log(d).$ The theory of continued fractions shows that these convergents alternate between being less than $x$ and greater than it (assuming $x$ is not already rational). Choose those that are less than $x.$

In the question, the first few such convergents are

3, 14 / 5, 165 / 59, 797 / 285, 4301 / 1538, 89043 / 31841, 279235 / 99852, 29036139 / 10383070 \dots .

$3, 14/5, 165/59, 797/285, 4301/1538, 89043/31841, 279235/99852, 29036139/10383070 \ldots.$

In the last case, a sequence of 29,036,139 rolls of a d6 will produce a sequence of 10,383,070 rolls of a d150 with a failure rate less than $2\times 10^{-8},$ for an efficiency of $2.79649$ --indistinguishable from the asymptotic limit.

— whuber
джерело

2

Amazing as always, it almost looks like this answer was formatted and prepared even before the question was asked!

— Łukasz Grad

1

Thank you, @ŁukaszGrad. However, I am innocent of any such machinations and I'm sure sharp-eyed readers will find evidence of the haste with which I have written this out, for which I apologize in advance.

— whuber

Shouldn't it also be taken into account that when

d $d$ isn't prime, the sample space

Ω(d,1) $\Omega(d,1)$ can be partitioned into subsets of equal probability? For example, you can use a d6 as a d2 or a d3, & a sample space with 162 elements - closer to 150 than 216 is - is then achievable with 4 rolls, 1d6+3d3. (That gives the same expected no. rolls as the 3d6 solution, but a lower variance.)

— Scortchi - Reinstate Monica

@Scortchi You describe a slightly different setting in which one has a choice of dice to use to simulate draws from a uniform distribution. A similar analysis applies--you might find it amusing to carry it out.

— whuber

7

For the case of $N=150$ , rolling a d6 three times distinctly creates $6^3=216$ outcomes.

The desired result can be tabulated in this way:

Record a d6 three times sequentially. This produces results $a,b,c$ . The result is uniform because all values of $a,b,c$ are equally likely (the dice are fair, and we are treating each roll as distinct).
Subtract 1 from each.
This is a senary number: each digit (place value) goes from 0 to 5 by powers of 6, so you can write the number in decimal using $(a - 1) \times 62 + (b - 1) \times 61 + (c - 1) \times 60$ $(a-1) \times 6^2 + (b-1) \times 6^1 + (c-1)\times 6^0$
Add 1.
If the result exceeds 150, discard the result and roll again.

The probability of keeping a result is $p=\frac{150}{216}=\frac{25}{36}$ . All rolls are independent, and we repeat the procedure until a "success" (a result in $1,2,\dots,150$ ) so the number of attempts to generate 1 draw between 1 and 150 is distributed as a geometric random variable, which has expectation $p^{-1}=\frac{36}{25}$ . Therefore, using this method to generate 1 draw requires rolling $\frac{36}{25}\times 3 =4.32$ dice rolls on average (because each attempt rolls 3 dice).

Credit to @whuber to for suggesting this in chat.

— Sycorax says Reinstate Monica
джерело

I believe Henry's method does not produce a uniform distribution. That's because the recycling will cause some digits to be favored. I'm not completely sure about that because I don't completely understand how the recycling is intended to be performed.

— whuber

1

@whuber AH! I understand your concern now. I just tried to explain the process to myself and I realized why my intuition was flawed: the probability of rolling an additional die may change the assignment of probabilities to decimal numbers and make it non-uniform because we don't know ahead of time how many dice we're rolling.

— Sycorax says Reinstate Monica

4

Here is an even simpler alternative to the answer by Sycorax for the case where $N=150$ . Since $150 = 5 \times 5 \times 6$ you can perform the following procedure:

Generating uniform random number from 1 to 150:

Make three ordered rolls of 1D6 and denote these as $R_1, R_2, R_3$ .

If either of the first two rolls is a six, reroll it until it is not 6.

The number $(R_1, R_2, R_3)$ is a uniform number using positional notation with a radix of 5-5-6. Thus, you can compute the desired number as: $X = 30 \cdot (R 1 - 1) + 6 \cdot (R 2 - 1) + (R 3 - 1) + 1.$ $X = 30 \cdot (R_1-1) + 6 \cdot (R_2-1) + (R_3-1) + 1.$

This method can be generalised to larger $N$ , but it becomes a bit more awkward when the value has one or more prime factors larger than $6$ .

— Reinstate Monica
джерело

1

Can you state the efficiency of this method in terms of the expected number of rolls per draw generated, and clarify why the outcome is uniform on 1,2,....,150?

— Sycorax says Reinstate Monica

The probability of getting an outcome that requires no re-rolling is

$25/36$ , which is the same as in your answer. To understand why it is uniform, note that you are effectively just generating a uniform number using positional notation with radix 5-5-6 (i.e., the last digit is the units, the second-last digit is the "sixes" and the third-last digit is the "thirties").

— Reinstate Monica

1

The method is effectively just a very slight variation on the method in your answer. In your answer you create a uniform number on the 6-6-6 number scale an then discard invalid values, whereas in my answer you discard invalid values first to generate a number on the 5-5-6 scale.

— Reinstate Monica

3

+1 As a practical matter this is an appealing algorithm. It is intriguing, and perhaps suggestive of a broader analysis, that it implements a finite state automaton driven by the die rolls. It has four states, {Start, A, B, Accept}. Start transitions to A upon rolling 1..5; A transitions to B upon rolling 1..5; and B transitions to Accept upon rolling anything. Each transition saves the value of the roll that caused it, so upon reaching Accept you output that sequence of three stored rolls and transition automatically back to Start.

— whuber

4

You reject as often as @Sycorax, but make fewer rolls on average. The expected no. rolls per variate is

$\frac{6}{5} + \frac{6}{5} + 1= 3.4$ .

— Scortchi - Reinstate Monica

2

As an illustration of an algorithm to choose uniformly between $150$ values using six-sided dice, try this which uses each roll to multiply the available values by $6$ and making each of the new values equally likely:

After $0$ rolls, you have $1$ possibility, not enough to distinguish $150$ values
After $1$ roll, you have $6$ possibilities, not enough to distinguish $150$ values
After $2$ rolls, you have $36$ possibilities, not enough to distinguish $150$ values
After $3$ rolls, you have $216$ possibilities, enough to distinguish $150$ values but with $66$ remaining values; the probability you stop now is $\frac{150}{216}$
If you have not stopped, then after $4$ rolls you have $396$ remaining possibilities, enough to distinguish $150$ values two ways but with $96$ remaining values; the probability you stop now is $\frac{300}{1296}$
If you have not stopped, then after $5$ rolls you have $576$ remaining possibilities, enough to distinguish $150$ values three ways but with $96$ remaining values; the probability you stop now is $\frac{450}{7776}$
If you have not stopped, then after $6$ rolls you have $756$ remaining possibilities, enough to distinguish $150$ values five ways but with $6$ remaining values; the probability you stop now is $\frac{750}{46656}$

If you are on one of the $6$ remaining values after $6$ rolls then you are in a similar situation to the position after $1$ roll. So you can continue in the same way: the probability you stop after $7$ rolls is $\frac{0}{279936}$ , after $8$ rolls is $\frac{150}{1679616}$ etc.

Add these up and you find that the expected number of rolls needed is about $3.39614$ . It provides a uniform selection from the $150$ , as you only select a value at a time when you can select each of the $150$ with equal probability

Sycorax asked in the comments for a more explicit algorithm

First, I will work in base- $6$ with $150_{10}=410_6$
Second, rather than target values $1_6$ to $410_6$ , I will subtract one so the target values are $0_6$ to $409_6$
Third, each die should have values $0_6$ to $5_6$ , and rolling a die involves adding a base $6$ digit to the right hand side of the existing generated number. Generated numbers can have leading zeros, and their number of digits is the number of rolls so far

The algorithm is successive rolls of dice:

Roll the first three dice to generate a number from $000_6$ to $555_6$ . Since $1000_6 \div 410_6 = 1_6 \text{ remainder } 150_6$ you take the generated value (which is also its remainder on division by $410_6$ ) if the generated value is strictly below $1000_6-150_6=410_6$ and stop;
If continuing, roll the fourth die so you have now generated a number from $4100_6$ to $5555_6$ . Since $10000_6 \div 410_6 = 12_6 \text{ remainder } 240_6$ you take the remainder of the generated value on division by $410_6$ if the generated value is strictly below $10000_6-240_6=5320_6$ and stop;
If continuing, roll the fifth die so you have now generated a number from $53200_6$ to $55555_6$ . Since $100000_6 \div 410_6 = 123_6 \text{ remainder } 330_6$ you take the remainder of the generated value on division by $410_6$ if the generated value is strictly below $100000_6-330_6=55230_6$ and stop;
If continuing, roll the sixth die so you have now generated a number from $552300_6$ to $555555_6$ . Since $1000000_6 \div 410_6 = 1235_6 \text{ remainder } 10_6$ you take the remainder of the generated value on division by $410_6$ if the generated value is strictly below $1000000_6-10_6=555550_6$ and stop;
etc.

— Henry
джерело

(+1) This answer would be more clear if you explained how you map the outcomes of, say, 4d6 or 5d6 to 1,2, ..., 150.

— Sycorax says Reinstate Monica

@Sycorax - I have now provided a base

$6$ mapping

— Henry

1

Entropy considerations indicate you can do substantially better than this algorithm. It also remains to show that your algorithm actually produces independently distributed values with uniform distributions.

— whuber

@whuber - My algorithm produces exactly one integer from

$150$ possibilities and does so uniformly providing the dice rolls are uniform and independent. At each step if reached, each of the

$150$ values is equally likely to be selected. It does not produce multiple values (unlike your answer)

— Henry

1

I misunderstood what you meant, then, in writing "the algorithm is successive rolls of dice." (I should have read through more carefully.) In doing so, it seems to me your algorithm does not produce a uniform distribution, but I'm not sure because I haven't been able to figure out what the general algorithm is intended to be. It would be good to see a demonstration that it does produce uniform values.

— whuber