Evolution of a discrete distribution of probability

375 Views Asked by At

I am designing a virtual card game and I defined an evolution of probabilities, but I don't have the knowledge on this matter to find out how they will evolve. I hope you help me here, with bibliography and teaching me some things.

This is the system: I have a discrete distribution of probability of the kind 5, 15, 20, 25, and 35%. When one of the five events happens then the corresponding probability will diminish by 10%. Example: if the 5%-event happens then its new probability will be 5 · 0.9 = 4.5%.

To preserve the property that it is a probability distribution, the probability lost by one event will be distributed equally to the other 4 events. Continuing the example above, I will add to each of the other probabilities an amount of $\frac{0.5\%}{4}=0.125\%$.

So every time an event happens the entire distribution will change. My question is: When the number of events tends to infinity, is there a predictable distribution of probability? My thought is that the limit distribution will be a constant or an oscillating probability distribution, so that every probability will be at 20% or oscillating close to this number... but I don't have any clue about how to test that assumption mathematically.

And, if the answer is that there is a constant limit distribution or a limit cycle, how fast will the original distribution evolve to it?

2

There are 2 best solutions below

4
On BEST ANSWER

You are looking at the temporal evolution of a probability distribution. Since the evolution rule is probabilistic (a stochastic process), each of the probabilities has a probability distribution itself. WW1 derived the deterministic equations for the evolution of the expectation value of these distributions, but as Did pointed out, this is only part of the answer. In particular, the fact that the expectation values converge does not necessarily mean that the distributions do, too.

The problem is that these distributions are discrete, but can attain more and more different values over time, which makes an analytic treatment very difficult. Let's look at a simulation instead. Instead of using Excel (✝ vade retro satanas!), I'll use Matlab:

p0 = [0.05 0.15 0.20 0.25 0.35];
n = numel(p0);

ns = 150;
nr = 1e5;
P = nan(nr, ns, n);
for r = 1 : nr                              % repetitions
    p = p0;
    for t = 1 : ns                          % time
        P(r, t, :) = p;

        k = sum(rand > cumsum(p)) + 1;
        d = 0.1 * p(k);
        p = p + d / 4;
        p(k) = p(k) - d / 4 - d;
    end
end

First, let's look at the evolution of the expected value:

E = squeeze(mean(P));
plot(E)
ylim([0 0.4])
xlabel time
ylabel E(p)

This agrees nicely with WW1's analytic solution.

Now, let's look at the evolution of the distributions themselves. For this we use a kernel density estimate, with a small bandwidth in order to capture discrete distributions at least approximately:

ts = [1 20 40 60 80 100];
for ti = 1 : numel(ts)
    t = ts(ti);
    subplot(2, 3, ti)
    title(sprintf('time = %d', t))
    hold on
    for i = 1 : n
        ksdensity(P(:, t, i), 'bandwidth', 0.001)
    end
    xlabel p
end

The distributions do indeed converge, and all of them converge to the same distribution with expectation 20 %. The distribution for the first probability, starting from 5 %, takes longest to get there, but from about 100 time steps convergence is completed for practical purposes.

Note however that this common distribution has a standard deviation of about 3.25 %, meaning that the individual values of $p$ can still deviate strongly from the expectation of 20%, and these deviations also occur in the individual realization.

plot(squeeze(P(1, :, :)))
ylim([0 0.4])
xlabel time
ylabel p


Now is there any way to figure out what this distribution is? Let's take a wild guess: The $p_i$ describe a multinomial distribution $\mathrm{Mult}(N, p_1, p_2, \ldots, p_n)$ where $N$ is the number of underlying (generalized-Bernoulli) realizations and $n$ is the number of alternatives. In Bayesian statistics, a distribution that often occurs to describe the distribution of the parameters of a multinomial distribution is the Dirichlet distribution $\mathrm{Dir}(\alpha_1, \alpha_2, \ldots, \alpha_N)$. What if the stationary distribution of our process was a Dirichlet distribution with constant parameters, $\alpha_i = \alpha$?

The expectation value of the Dirichlet is $$ \mathrm E(p_i) = \frac{\alpha_i}{\alpha_0} \quad \text{with}\quad \alpha_0 = \sum_i \alpha_i, $$ which for constant parameters becomes $$ \mathrm E(p_i) = \frac{\alpha}{n ~ \alpha} = \frac1n \quad \text{since}\quad \alpha_0 = n ~ \alpha. $$ Which is consistent with our observations.

The variance of the Dirichlet is $$ \mathrm{var}(p_i) = \frac{\alpha_i ~ (\alpha_0 - \alpha_i)}{\alpha_0^2 ~ (\alpha_0 + 1)} = \frac{n - 1}{n^2 ~ (n \alpha + 1)}. $$

Let's use this to estimate $\alpha$ from the simulation:

v = mean(var(squeeze(P(:, end, :))));
alpha = (n - 1) / n^3 / v - 1 / n

The result is 30.0856361440411.

We looked only at the distribution of the single $p_i$, not at their joint distributions. This marginal distribution of the Dirichlet is the Beta distribution, $p_i \sim \mathrm{Beta}(\alpha_i, \alpha_0 - \alpha_i)$. Fitting the Beta distribution to the combined data of all the $p_i$, P(:, end, :) in Matlab's Statistics and Machine Learning Toolbox results in a nice match

The different estimate of $\alpha$, 30.1693, can be explained by the fact that it does not result from moment matching but from Maximum Likelihood. It appears that at least the marginal distributions are well described by the Dirichlet approach.

2
On

define 5 sequences $a_n, b_n, c_n, d_n, e_n$ with $a_0=5$% , $b_0=15$% etc. and $a_n$ is the expected value of the first probability after n events.

then the sequences evolve according to ...

$$a_{n+1} = a_n - 0.1 a_n^2 + 0.025( b_n^2 +c_n^2 +d_n^2 +e_n^2) $$

$$b_{n+1} = b_n - 0.1 b_n^2 + 0.025( a_n^2 +c_n^2 +d_n^2 +e_n^2) $$

$$c_{n+1} = c_n - 0.1 c_n^2 + 0.025( b_n^2 +a_n^2 +d_n^2 +e_n^2) $$

$$d_{n+1} = d_n - 0.1 d_n^2 + 0.025( b_n^2 +c_n^2 +a_n^2 +e_n^2) $$

$$e_{n+1} = e_n - 0.1 e_n^2 + 0.025( b_n^2 +c_n^2 +d_n^2 +a_n^2) $$

This can be nicely modelled in a spreadsheet ...

Spreadsheet

so you are right in saying all probabilities converged to 20% ( no sign of any limit cycle ), it is interesting to see that the probability that started at 20% actually goes up at first and then comes back down, all others monotonically (either always increasing or always decreasing ) approach 20%. It takes 59 events before the 5% probability grows to 19%.