Mean Number of Samples from Uniform Distribution to Achieve a Desired Total

414 Views Asked by At

Problem Statement

I'm interested in the mean number of samples from a given uniform distribution of integers I would need to take to reach a desired total or more.

For example, a six-sided dice is a uniform distribution of integers from $1$ through to $6$. The number of times you have to roll a six-sided dice ($r$) to achieve a desired total or greater ($t$) produces a distribution. For example, here is a graph produced by sampling dice rolls, aiming for a total of $80$ per run:

Graph of six-sided dice rolls with a target of $80$

The mean of this distribution is approximately $23.332448$. This is the number I am interested in deriving. Simply 23 would be too imprecise for what I need. Right now I am determining this by simply sampling large numbers of dice rolls and then calculating the average of that data set, however, I would like to instead understand the statistics involved and compute the mean, or a reasonable approximation of the mean.

One might try to derive the mean of the number of rolls needed $P(r=t)=0.5$ by simply using the mean of the uniform distribution, and dividing the total $t$ by that mean:

$$\frac{t}{(max-min)/2+min} = \frac{80}{(6-1)/2+1} \approx 22.857$$

I will call this the 'naive' approximation. As you can see, this result can be significantly off the desired result. Sometimes, it happens to be close to the right answer, however, I am interested in many cases where it is unacceptably imprecise.

Ultimately, I want to have a method or formula that takes an arbitrary continuous uniform distribution of integers (such as a die) and the desired total, which tells me the mean number of samples from that distribution I would need to take in order to reach that total.

Prior Research

I've looked at a few posts that talk about this subject, however, none of them were able to provide the complete answer to me. Here are two of the things I've found and read:

A Similar Question on This Forum

Number of dice rolls taken to reach a certain sum
This was useful in creating an approximation of the actual distribution I'm trying to find. I used this to turn the following recursive function into the following code (in rust, dependency on the cached crate for memoizing the recursive results to improve performance to something usable):

$$P(S_t=s) = \sum_{i=1}^6P(S_{t-1} = s-i)/6$$

With the following terminators:

  • $P(S_0=0) = 1$
  • $P(S_0<0) = P(S_{t>0}=0) = 0$
#[cached]
pub fn p(t: i32, s: i32) -> f64 {
    if s <= 0 && t <= 0 {
        return 1.0;
    }

    if s <= 0 || t <= 0 {
        return 0.0;
    }

    (1..(std::cmp::min(6, s) + 1)).map(|i| p(t - 1, s - i)).sum::<f64>() / 6.0
}

Which, when run, can be used to create a graph comparing the frequency of the sampled results to the frequency predicted by that algorithm. In the graph, the blue line is the result of the algorithm, while the orange line is sampled data from a simple PRNG using a uniform distribution (the same data as in the first graph).

Note that to get this graph, you have to divide the results of this algorithm by the inverse of the mean of the uniform distribution (so $3.5^{-1}$ in this example) The author of the original answer suggests dividing by $P(S_t=36)$ (where $36$ was their target) however fails to realise that this is just an approximation of the inverse of the mean. I'm not actually sure why the mean of the uniform pops out here, so that's an interesting tidbit.

This looks pretty good, however, it does not provide the mean number of rolls as best I can tell. The author of the answer to the linked question states that the mean in their example is "about $10.5238577$", however, they don't state how they derived that number and I haven't been able to figure it out.

It's also not generalised to any uniform, however, I don't believe it would be difficult to generalise. I don't anticipate this to be an obstacle if someone finds out how to derive the mean from that answer.

A Similar Question on Reddit

On average, how many rolls of a fair die would it take to get a cumulative score above n?
This was useful in creating a different approach to solving this problem, where I iterated $r$ (the number of rolls) upwards from 1, each time producing an approximation of the distribution of the combined rolls using a normal distribution. The first normal distribution to have a cumulative distribution function for the given target of less than $0.5$ indicated when the integer mean number of rolls was being made. Again, however, like the previous solution, this only produced the integer number of rolls. I need to know the mean number of rolls more precisely than this.

There are a number of other answers than the top up-voted answer on this Reddit post with differing amounts of clear information. The answer from the user u/possiblywrong shows an interesting cumulative distribution function which I wonder if it would solve my problem. However, I am unable to figure out how to reproduce their ideas as code or fully understand them. For completeness sake, here is an image of that response in case it is deleted in the future.

Thanks

Thank you for any time you spend on this, this is doing my head in. It feels like it should be so easy to model, and yet I've lost an entire day researching it only to find vague answers that are either too complicated for me, or are simply missing information that I need to understand them. Hopefully, you can help.

2

There are 2 best solutions below

1
On BEST ANSWER

Let $f_{s}(t)$ be the expected number of rolls to get a sum greater than or equal to $t$ with an $s$-sided die. Then you have the recurrence $$f_s(t) = 1+\frac{1}{s}\sum_{k=1}^s f_s(t-k)$$ with the base cases of $f_s(t) = 0$ with $t \le 0$.

The solution to this (ignoring the base cases) $$f_s(t) = \frac{2}{1+s}t+\sum_{n=1}^{s}c_nr_n^t$$

where $r_n$ goes over the distinct roots of $sr^{s+1}-(s+1)r^s+1$. Alternately, since $r=1$ is a double root, this becomes $$f_s(t) = \frac{2}{1+s}t+c_1+\sum_{n=2}^{s}c_nr_n^t$$

where $r_n$ instead goes over the roots of $$\sum_{m=0}^{s-1}(m+1)r^m = 1+2r+3r^2+...+sr^{s-1} \tag 1$$

To find the exact values of $c_1, ..., c_n$, you would need to solve the equations $f_s(t) = 0$ for $t = 1-s, 2-s, ..., 0$.

Alternately, you could just build up from $t = 0$ (using a computer) to find that $$f_6(80) = \frac{694905206512232068400374286878415315634626408538515111020013047}{29781651707669509088572079548239633038047628833600290523447296} \approx 23.333$$

If you don't need the exact value, since the magnitude of each root of $(1)$ is less than $1$, an approximation would be $f_s(t) \approx \frac{2}{1+s}t+c_1 = \frac{2}{1+s}t+O(1)$

1
On

I will do the reasoning with the dice, but a generalization is really straightforward. I do not have a formula for the mean. Nevertheless, I provide a piece a code in Python, which compute the precise number for given N value (in this case $80$) and number of SIDES (in this case $6$). Also, I derived an approximative formula, which might be sufficient. (Unless, I have a mistake in my reasoning.)

Mathematical view

Let us denote a Markov chain with states $S_i$ for $i$ between $0$ and $N$, where $N$ is the target value. A state says, how many dots we have thrown exactly in total. The only exception is the last state $S_N$, where we get, if we threw $\geq N$ dots.

Then we have natural transition probabilities, i.e. $$p(S_i \to S_j) = \frac{1}{6},$$ if $i < j \leq i + 6$ and $j \not = N$.

For the last state, the probabilities differs. $$p(S_{N-i} \to S_N) = \frac{6-i}{6},$$ for $i \in 0, \dots, 5$.

The law of total expectation

Let $X_i$ be a random variable denoting a number of throws needed to get to $S_i$. We will compute $\mathbb E[X_i]$, which will be denoted by $e_i$ for simplicity. We use the law of total expectation to compute it by induction. The partition of sample will be a configuration of the last step. The configuration $C$ consists of the last state and number of dots thrown in the last step.

That is,

$$e_i = \mathbb E \left[\mathbb E[X_i|C] \right] = \sum_{(j,d)} \mathbb E[X_i|C=(j,d)]P(C=(j,d)),$$

where $S_j$ was the last state and $d$ is the number of thrown dots, such that throwing $d$ dots gets us to $S_i$.

Usually, we have only $6$ configurations: the last state was $i - k$ for $k \in 1, \dots, 6$, while the last throw gave $k$ dots. This statement does not hold for only for the initial states and the last state.

The intial states have only limited number of configurations (e.g. we cannot reach state $S_3$ by throwing $6$), and the from the $S_{N-1}$ any number suffice to reach $S_N$ and so on.

Decomposing the formula

It hold the following

$$\mathbb E[X_i|C=(j,d)] = e_j + 1,$$

as the LSH is the expected numbers of throws to get to $S_i$, if the last throw will move us from $S_j$ to $S_i$.

A bit more tricky is computing the probability $P(C=(j,d))$. In one of the previous questions, you mention is computed, $P(A_j)$, where $A_j$ is the event for visiting $S_j$ during computation. We can then compute the probability that we have thrown $d$ dots while being in state $S_j$ (these two are obviously indepentent event), which is $\frac{1}{6}P(A_j)$.

The tricky part, in my opinion, is realization that we need to normalize these probalities to get $P(C=(j,d))$.

That is,

$$P(C=(j,d)) = \frac{\frac{1}{6}P(A_j)}{\sum_{(j',d')} \frac{1}{6}P(A_{j'})},$$

where $(j',d')$ are all valid configurations.

It is basically the Bayes theorem to get $$P(\text{particular configuration $C = (j,d)$ happen}\;|\;\text{configuration $C$ is the last step})$$

$$P(\text{$C = (j,d)$}\;|\;\text{$C$ is last}) = \frac{P(C = (j,d) \land \text{$C$ is last})}{P(\text{$C$ is last})},$$

where $P(C = (j,d) \land \text{$C$ is last})$ reduces to $P(C = (j,d)$ and $P(\text{$C$ is last})$ will be computed over all particular valid configurations.

This allow us to compute $e_N$.

Programming part

There are several parts. First, we compute the probabilities $P(A_i)$ for $i \in 0, \dots, N-1$, i.e. the probability that we visit $S_i$. Then we compute all expectations except the target one, which will be computed separately.

Probabilities

It is the lengthier part, but basically it is about exploiting the reccurence relations from the mentioned answer to the previous problem.

from fractions import Fraction

N = 80
SIDES = 6


def prepare_probabilities_array():
    p = [
        [... for _ in range(N)]
        for _ in range(N)
    ]

    p[0][0] = Fraction(1)

    for i in range(1, N):
        p[0][i] = Fraction()
        p[i][0] = Fraction()

    return p

def fill_array(p):
    for throw in range(1, N):
        for num in range(1, N):
            p[throw][num] = sum(
                p[throw - 1][num - dots]
                for dots in range(1, min(num + 1,SIDES + 1))
            ) / SIDES

def get_prob_of_num(p, num):
    return sum(p[throw][num] for throw in range(0, N))


def get_probabilities_of_num(p):
    return [get_prob_of_num(p, num) for num in range(0, N)]

# [throws][num]
PROBABILITIES = prepare_probabilities_array()  # set initial conditions
fill_array(PROBABILITIES)  # do some dynamic programming
NUM_PROB = get_probabilities_of_num(PROBABILITIES)  # prob of visit, i.e. P(A_i)

print([(i,float(v)) for i,v in enumerate(NUM_PROB)])

See that for large values of we even don't need to bother with the actual computing, as the probabilities goes quickly to $\frac{2}{7}$, i.e. $1$ over the mean of one throw.

Expectations

Then we compute the array of expectations, again for $i \in 0, \dots, N-1$, while the $e_N$ will be computed separately.

def compute_expectations(n_p):
    e = [Fraction()]
    for num in range(1, N):
        normalization_factor = Fraction()
        exp_contrib = Fraction()
        for dots in range(1, min(SIDES + 1, num + 1)):
            prob = n_p[num - dots]
            normalization_factor += prob
            exp_contrib += prob * (e[num - dots] + 1)
        e.append(exp_contrib / normalization_factor)
    return e

EXPECTATIONS = compute_expectations(NUM_PROB)
print([(i,float(v)) for i,v in enumerate(EXPECTATIONS)])

Note that I omitted $\frac{1}{6}$ in the computation, as it always cancels out.

See that differences between expectations of $S_i$ and $S_{i+1}$ also tends to $\frac{2}{7}$ for large $i$ which is a good base for an approximating formula.

The target expectation

def compute_target(n_p, e):
    normalization_factor = Fraction()
    exp_contrib = Fraction()
    for min_dots in range(0, SIDES):
        prob = Fraction(SIDES - min_dots, SIDES) \
               * n_p[N - min_dots - 1]
        normalization_factor += prob
        exp_contrib += prob * (e[N - min_dots - 1] + 1)
    return exp_contrib / normalization_factor

print(float(compute_target(NUM_PROB, EXPECTATIONS)))  # 23.33333333333144

Instead of computing each valid last configuration, it is more convinient to work with transition probabilities for the last state defined at the beginning.

Approximative closed formula

As hinted in previous section, we can form a closed formula which approximates the result pretty well by assuming the following:

$$e_{N-i} \approx e_c + (N-i-c)T,$$

where $T$ is inverse of the expected value from one roll, i.e. $\frac{2}{k+1}$ for $k$ being the number of sides, and $c$ is a constant large enough.

Thus, $$ e_N \approx \frac{\sum_{i=1}^k \frac{k+1-i}{k}(e_c+(N-i-c)T + 1)}{\sum_{i=1}^k\frac{k+1-i}{k}},$$ which can be further simplified.

$$ e_N \approx \frac{\sum_{i=1}^k \frac{k+1-i}{k}(e_c+NT-cT+1-iT)}{\frac{k+1}{2}}$$

$$ e_N \approx \frac{\frac{k+1}{2}(e_c+NT-cT+1) - \sum_{i=1}^k \frac{k+1-i}{k}iT}{\frac{k+1}{2}}$$

$$ e_N \approx e_c+NT-cT+1 - \frac{\frac{T}{k}\sum_{i=1}^k (k+1-i)i}{\frac{k+1}{2}}$$

$$ e_N \approx e_c+NT-cT+1 - \frac{4}{k(k+1)^2}\sum_{i=1}^k (k+1-i)i$$

$$ e_N \approx e_c+NT-cT+1 - \frac{4}{k(k+1)^2} \left( \sum_{i=1}^k i(k+1)- \sum_{i=1}^k i^2 \right)$$

$$ e_N \approx e_c+NT-cT+1 - \frac{4}{k(k+1)^2} \left( \frac{k(k+1)^2}{2} - \frac{k(k+1)(2k+1)}{6} \right)$$

$$ e_N \approx e_c+NT-cT+1 - \frac{2(k+2)}{3(k+1)}$$

$$ e_N \approx e_c+1 + \frac{2}{k+1} \left( N-c-\frac{k+2}{3} \right)$$

c = 40

approx = EXPECTATIONS[c] + 1 + 2/(SIDES+1)*(N - c - (SIDES+2)/3)
print(approx)  # 23.3333469998339

Conclusion

I provided (at least I hope) a mathematical method, which lead to the code which computes precisely the desired value (at least for small $N$, such that Fraction can handle the computation).

The further observations and assumptions allow us to make a closed formula which can also do reasonably well (although I do not know the precision you can). Also, it can be scaled by selecting a convenient value of $c$.

I did not try to compute the probabilities $P(A_i)$ by solving the equations, however, I assume it will be quite a mess. Nevertheless, that might be a way how to reach the precise closed formula for your problem.