Намалюйте цілі числа незалежно та рівномірно навмання від 1 до


18

Я хочу намалювати цілі числа від 1 до певного конкретного NN , прокрутивши деяку кількість справедливих шестигранних кісток (d6). Хороша відповідь пояснить, чому його метод виробляє єдині та незалежні цілі числа.

Як наочний приклад, було б корисно пояснити, як працює рішення для випадку N = 150N=150 .

Крім того, я хочу, щоб процедура була максимально ефективною: розгорніть найменшу кількість d6 в середньому для кожного створеного числа.

Переходи від сенарних до десяткових допустимі.


На це питання надихнула ця мета-нитка .

Відповіді:


12

Набір Ω ( d , n )Ω(d,n) чітко визначених результатів у nn незалежних рулонах штампу з d = 6d=6 граней має d ndn елементів. Коли штамп справедливий, це означає, що кожен результат одного рулону має ймовірність 1 / д,1/d а незалежність означає, що кожен з цих результатів матиме ймовірність ( 1 / д ) n :(1/d)n: тобто вони мають рівномірний розподіл P d , n .Pd,n.

Припустимо, ви розробили певну процедуру t,t яка визначає m-m результат померлої сторони c ( = 150 )c(=150) - тобто елемент Ω ( c , m ) - абоΩ(c,m) інший повідомляє про помилку (а це означає, що вам доведеться повторити це для отримання результату). Тобто,

t : Ω ( d , n ) Ω ( c , m ) { Збій } .

t:Ω(d,n)Ω(c,m){Failure}.

Нехай FF - ймовірність tt призводить до невдачі, і зауважимо, що FF є деяким інтегральним кратним d - n ,dn, скажімо

F = Pr ( t ( ω ) = відмова ) = N Fd - n .

F=Pr(t(ω)=Failure)=NFdn.

(Для подальшої довідки зауважте, що очікувана кількість разів, коли потрібно запустити t,t перш ніж не вийти з ладу, становить 1 / ( 1 - F ) .1/(1F). )

Вимога , щоб ці результати в П ( с , м )Ω(c,m) бути однорідними і НЕ залежить умовний від тt не повідомляє , що значить не тt зберігає ймовірність в тому сенсі , що для кожної події Ом ( з , м ) ,AΩ(c,m),

P d , n ( t A )1 - F =Pc,m(A)

Pd,n(tA)1F=Pc,m(A)(1)

де

t ( A ) = { ω Ω t ( ω ) A }

t(A)={ωΩt(ω)A}

це набір штампів валків , що процедура тt привласнює події A .A.

Consider an atomic event A={η}Ω(c,m)A={η}Ω(c,m), which must have probability cm.cm. Let t(A)t(A) (the dice rolls associated with ηη) have NηNη elements. (1)(1) becomes

Nηdn1NFdn=Pd,n(tA)1F=Pc,m(A)=cm.

Nηdn1NFdn=Pd,n(tA)1F=Pc,m(A)=cm.(2)

It is immediate that the NηNη are all equal to some integer N.N. It remains only to find the most efficient procedures t.t. The expected number of non-failures per roll of the cc sided die is

1m(1F).

1m(1F).

There are two immediate and obvious implications. One is that if we can keep FF small as mm grows large, then the effect of reporting a failure is asymptotically zero. The other is that for any given mm (the number of rolls of the cc-sided die to simulate), we want to make FF as small as possible.

Let's take a closer look at (2)(2) by clearing the denominators:

Ncm=dnNF>0.

Ncm=dnNF>0.

This makes it obvious that in a given context (determined by c,d,n,mc,d,n,m), FF is made as small as possible by making dnNFdnNF equal the largest multiple of cmcm that is less than or equal to dn.dn. We may write this in terms of the greatest integer function (or "floor") as

N=dncm.

N=dncm.

Finally, it is clear that NN ought to be as small as possible for highest efficiency, because it measures redundancy in tt. Specifically, the expected number of rolls of the dd-sided die needed to produce one roll of the cc-sided die is

N×nm×11F.

N×nm×11F.

Thus, our search for high-efficiency procedures ought to focus on the cases where dndn is equal to, or just barely greater than, some power cm.cm.

The analysis ends by showing that for given dd and c,c, there is a sequence of multiples (n,m)(n,m) for which this approach approximates perfect efficiency. This amounts to finding (n,m)(n,m) for which dn/cm1dn/cm1 approaches N=1N=1 in the limit (automatically guaranteeing F0F0). One such sequence is obtained by taking n=1,2,3,n=1,2,3, and determining

m=nlogdlogc.

m=nlogdlogc.(3)

The proof is straightforward.

This all means that when we are willing to roll the original dd-sided die a sufficiently large number of times n,n, we can expect to simulate nearly logd/logc=logcdlogd/logc=logcd outcomes of a cc-sided die per roll. Equivalently,

It is possible to simulate a large number mm of independent rolls of a cc-sided die using a fair dd-sided die using an average of log(c)/log(d)+ϵ=logd(c)+ϵlog(c)/log(d)+ϵ=logd(c)+ϵ rolls per outcome where ϵϵ can be made arbitrarily small by choosing mm sufficiently large.


Examples and algorithms

In the question, d=6d=6 and c=150,c=150, whence

logd(c)=log(c)log(d)2.796489.

logd(c)=log(c)log(d)2.796489.

Thus, the best possible procedure will require, on average, at least 2.7964892.796489 rolls of a d6 to simulate each d150 outcome.

The analysis shows how to do this. We don't need to resort to number theory to carry it out: we can just tabulate the powers dn=6ndn=6n and the powers cm=150mcm=150m and compare them to find where cmdncmdn are close. This brute force calculation gives (n,m)(n,m) pairs

(n,m){(3,1),(14,5),}

(n,m){(3,1),(14,5),}

for instance, corresponding to the numbers

(6n,150m){(216,150),(78364164096,75937500000),}.

(6n,150m){(216,150),(78364164096,75937500000),}.

In the first case tt would associate 216150=66216150=66 of the outcomes of three rolls of a d6 to Failure and the other 150150 outcomes would each be associated with a single outcome of a d150.

In the second case tt would associate 78364164096759375000007836416409675937500000 of the outcomes of 14 rolls of a d6 to Failure -- about 3.1% of them all -- and otherwise would output a sequence of 5 outcomes of a d150.

A simple algorithm to implement tt labels the faces of the dd-sided die with the numerals 0,1,,d10,1,,d1 and the faces of the cc-sided die with the numerals 0,1,,c1.0,1,,c1. The nn rolls of the first die are interpreted as an nn-digit number in base d.d. This is converted to a number in base c.c. If it has at most mm digits, the sequence of the last mm digits is the output. Otherwise, tt returns Failure by invoking itself recursively.

For much longer sequences, you can find suitable pairs (n,m)(n,m) by considering every other convergent n/mn/m of the continued fraction expansion of x=log(c)/log(d).x=log(c)/log(d). The theory of continued fractions shows that these convergents alternate between being less than xx and greater than it (assuming xx is not already rational). Choose those that are less than x.x.

In the question, the first few such convergents are

3,14/5,165/59,797/285,4301/1538,89043/31841,279235/99852,29036139/10383070.

3,14/5,165/59,797/285,4301/1538,89043/31841,279235/99852,29036139/10383070.

In the last case, a sequence of 29,036,139 rolls of a d6 will produce a sequence of 10,383,070 rolls of a d150 with a failure rate less than 2×108,2×108, for an efficiency of 2.796492.79649--indistinguishable from the asymptotic limit.


2
Amazing as always, it almost looks like this answer was formatted and prepared even before the question was asked!
Łukasz Grad

1
Thank you, @ŁukaszGrad. However, I am innocent of any such machinations and I'm sure sharp-eyed readers will find evidence of the haste with which I have written this out, for which I apologize in advance.
whuber

Shouldn't it also be taken into account that when dd isn't prime, the sample space Ω(d,1)Ω(d,1) can be partitioned into subsets of equal probability? For example, you can use a d6 as a d2 or a d3, & a sample space with 162 elements - closer to 150 than 216 is - is then achievable with 4 rolls, 1d6+3d3. (That gives the same expected no. rolls as the 3d6 solution, but a lower variance.)
Scortchi - Reinstate Monica

@Scortchi You describe a slightly different setting in which one has a choice of dice to use to simulate draws from a uniform distribution. A similar analysis applies--you might find it amusing to carry it out.
whuber

7

For the case of N=150N=150, rolling a d6 three times distinctly creates 63=21663=216 outcomes.

The desired result can be tabulated in this way:

  • Record a d6 three times sequentially. This produces results a,b,ca,b,c. The result is uniform because all values of a,b,ca,b,c are equally likely (the dice are fair, and we are treating each roll as distinct).
  • Subtract 1 from each.
  • This is a senary number: each digit (place value) goes from 0 to 5 by powers of 6, so you can write the number in decimal using (a1)×62+(b1)×61+(c1)×60
    (a1)×62+(b1)×61+(c1)×60
  • Add 1.
  • If the result exceeds 150, discard the result and roll again.

The probability of keeping a result is p=150216=2536p=150216=2536. All rolls are independent, and we repeat the procedure until a "success" (a result in 1,2,,1501,2,,150) so the number of attempts to generate 1 draw between 1 and 150 is distributed as a geometric random variable, which has expectation p1=3625p1=3625. Therefore, using this method to generate 1 draw requires rolling 3625×3=4.323625×3=4.32 dice rolls on average (because each attempt rolls 3 dice).


Credit to @whuber to for suggesting this in chat.


I believe Henry's method does not produce a uniform distribution. That's because the recycling will cause some digits to be favored. I'm not completely sure about that because I don't completely understand how the recycling is intended to be performed.
whuber

1
@whuber AH! I understand your concern now. I just tried to explain the process to myself and I realized why my intuition was flawed: the probability of rolling an additional die may change the assignment of probabilities to decimal numbers and make it non-uniform because we don't know ahead of time how many dice we're rolling.
Sycorax says Reinstate Monica

4

Here is an even simpler alternative to the answer by Sycorax for the case where N=150N=150. Since 150=5×5×6150=5×5×6 you can perform the following procedure:

Generating uniform random number from 1 to 150:

  • Make three ordered rolls of 1D6 and denote these as R1,R2,R3R1,R2,R3.
  • If either of the first two rolls is a six, reroll it until it is not 6.
  • The number (R1,R2,R3)(R1,R2,R3) is a uniform number using positional notation with a radix of 5-5-6. Thus, you can compute the desired number as: X=30(R11)+6(R21)+(R31)+1.
    X=30(R11)+6(R21)+(R31)+1.

This method can be generalised to larger NN, but it becomes a bit more awkward when the value has one or more prime factors larger than 6.


1
Can you state the efficiency of this method in terms of the expected number of rolls per draw generated, and clarify why the outcome is uniform on 1,2,....,150?
Sycorax says Reinstate Monica

The probability of getting an outcome that requires no re-rolling is 25/36, which is the same as in your answer. To understand why it is uniform, note that you are effectively just generating a uniform number using positional notation with radix 5-5-6 (i.e., the last digit is the units, the second-last digit is the "sixes" and the third-last digit is the "thirties").
Reinstate Monica

1
The method is effectively just a very slight variation on the method in your answer. In your answer you create a uniform number on the 6-6-6 number scale an then discard invalid values, whereas in my answer you discard invalid values first to generate a number on the 5-5-6 scale.
Reinstate Monica

3
+1 As a practical matter this is an appealing algorithm. It is intriguing, and perhaps suggestive of a broader analysis, that it implements a finite state automaton driven by the die rolls. It has four states, {Start, A, B, Accept}. Start transitions to A upon rolling 1..5; A transitions to B upon rolling 1..5; and B transitions to Accept upon rolling anything. Each transition saves the value of the roll that caused it, so upon reaching Accept you output that sequence of three stored rolls and transition automatically back to Start.
whuber

4
You reject as often as @Sycorax, but make fewer rolls on average. The expected no. rolls per variate is 65+65+1=3.4.
Scortchi - Reinstate Monica

2

As an illustration of an algorithm to choose uniformly between 150 values using six-sided dice, try this which uses each roll to multiply the available values by 6 and making each of the new values equally likely:

  • After 0 rolls, you have 1 possibility, not enough to distinguish 150 values
  • After 1 roll, you have 6 possibilities, not enough to distinguish 150 values
  • After 2 rolls, you have 36 possibilities, not enough to distinguish 150 values
  • After 3 rolls, you have 216 possibilities, enough to distinguish 150 values but with 66 remaining values; the probability you stop now is 150216
  • If you have not stopped, then after 4 rolls you have 396 remaining possibilities, enough to distinguish 150 values two ways but with 96 remaining values; the probability you stop now is 3001296
  • If you have not stopped, then after 5 rolls you have 576 remaining possibilities, enough to distinguish 150 values three ways but with 96 remaining values; the probability you stop now is 4507776
  • If you have not stopped, then after 6 rolls you have 756 remaining possibilities, enough to distinguish 150 values five ways but with 6 remaining values; the probability you stop now is 75046656

If you are on one of the 6 remaining values after 6 rolls then you are in a similar situation to the position after 1 roll. So you can continue in the same way: the probability you stop after 7 rolls is 0279936, after 8 rolls is 1501679616 etc.

Add these up and you find that the expected number of rolls needed is about 3.39614. It provides a uniform selection from the 150, as you only select a value at a time when you can select each of the 150 with equal probability


Sycorax asked in the comments for a more explicit algorithm

  • First, I will work in base-6 with 15010=4106
  • Second, rather than target values 16 to 4106, I will subtract one so the target values are 06 to 4096
  • Third, each die should have values 06 to 56, and rolling a die involves adding a base 6 digit to the right hand side of the existing generated number. Generated numbers can have leading zeros, and their number of digits is the number of rolls so far

The algorithm is successive rolls of dice:

  • Roll the first three dice to generate a number from 0006 to 5556. Since 10006÷4106=16 remainder 1506 you take the generated value (which is also its remainder on division by 4106) if the generated value is strictly below 100061506=4106 and stop;

  • If continuing, roll the fourth die so you have now generated a number from 41006 to 55556. Since 100006÷4106=126 remainder 2406 you take the remainder of the generated value on division by 4106 if the generated value is strictly below 1000062406=53206 and stop;

  • If continuing, roll the fifth die so you have now generated a number from 532006 to 555556. Since 1000006÷4106=1236 remainder 3306 you take the remainder of the generated value on division by 4106 if the generated value is strictly below 10000063306=552306 and stop;

  • If continuing, roll the sixth die so you have now generated a number from 5523006 to 5555556. Since 10000006÷4106=12356 remainder 106 you take the remainder of the generated value on division by 4106 if the generated value is strictly below 10000006106=5555506 and stop;

  • etc.


(+1) This answer would be more clear if you explained how you map the outcomes of, say, 4d6 or 5d6 to 1,2, ..., 150.
Sycorax says Reinstate Monica

@Sycorax - I have now provided a base 6 mapping
Henry

1
Entropy considerations indicate you can do substantially better than this algorithm. It also remains to show that your algorithm actually produces independently distributed values with uniform distributions.
whuber

@whuber - My algorithm produces exactly one integer from 150 possibilities and does so uniformly providing the dice rolls are uniform and independent. At each step if reached, each of the 150 values is equally likely to be selected. It does not produce multiple values (unlike your answer)
Henry

1
I misunderstood what you meant, then, in writing "the algorithm is successive rolls of dice." (I should have read through more carefully.) In doing so, it seems to me your algorithm does not produce a uniform distribution, but I'm not sure because I haven't been able to figure out what the general algorithm is intended to be. It would be good to see a demonstration that it does produce uniform values.
whuber
Використовуючи наш веб-сайт, ви визнаєте, що прочитали та зрозуміли наші Політику щодо файлів cookie та Політику конфіденційності.
Licensed under cc by-sa 3.0 with attribution required.