A 'Kinda' Normal Distribution Over the Unit Interval? (the ladybug won't die)

222 Views Asked by At

Update: I added the stochastic-processes tag.

For my non-theoretical computer model/application, I'm trying to approximate a Bernoulli random variable. So we are 'watching the ladybug' to get a real-time estimate of the probability as each observation is processed.

The application uses an $\alpha$ of $\text{1%}$. This is a filtering parameter - we allow for the fact that the Bernoulli distribution itself can change over time. If things (strategic adaptions) changed quickly, an $\alpha = \text{10%}$ might work better - we 'open up the filter' to place more value in the current reading.


Let $\alpha = 0.01$.

A ladybug is walking on the open unit interval $(0,1)$. She starts at the midpoint $\frac{1}{2}$. Every minute she makes a move to either the right (a larger number) or to the left (a smaller number).

The ladybug makes the move by flipping a coin.

If the coin comes up heads and she is at position $x$, she move to the right to the number

$\tag 1 x + \alpha \, (1 - x) = (1 - \alpha)\,x + \alpha$

If the coin comes up tails and she is at position $x$, she move to the left to the number

$\tag 2 (1 - \alpha) \, x$

If the ladybug does this for many weeks, is there a continuous random variable that statisticians use to estimate probabilities that the ladybug is in some sub-interval?

Note: I am just curious about this and would expect any answer to be a bit esoteric from my perspective. For example, it might involve Beta distributions, something that I've never studied.

1

There are 1 best solutions below

8
On

As Kimchi Lover notes in comments below, this is quite a complex problem related to a specially structured random walk. The answer below gives a “one shot” look at it, but is incomplete.


I doubt that this follows any nice distribution exactly. Let $X_0$ be the initial position of the ladybug (i.e. $X_0=1/2$). Then the next position of the ladybug, $X_1$, is given by $$ X_1=(1-\alpha)X_0+\alpha{S_0},$$ where $S_0$ is a Bernoulli random variable with mean $p$ ($p=1/2$ in your example). Note that $X_1$ is now a random variable. Repeating in this manner, we can derive an expression for $X_{n+1}$ for arbitrary $n$... $$ X_{n+1}=(1-\alpha)^{n+1}X_0+\alpha\sum_{k=0}^n(1-\alpha)^kS_{n-k}.$$ This is just a constant plus a linear combination of Bernoulli random variables. It may be possible to work out the CDF of $X_n$ in closed form, but in general, a linear combination of Bernoulli random variables doesn't exactly follow any nice distribution.

However, this is well approximated with a normal distribution (cf. the central limit theorem) with mean $p$ and variance $$ \frac{p(1-p)\alpha^2}{1-(1-\alpha)^2} $$ (I obtained these expressions by computing the variance of $X_n$ above and taking the limit as $n\to\infty$). We can see that this approximation is rather poor for small $n$ (say, $n=10$):

only ten samples gives a poor fit

but looks better as $n\to\infty$ (approximated here by $n=5000$):

looks great as the CLT kicks in.

(these plots are for $p=1/2$ and $X_0=1/2$, per the question).

However, as brought up in the comments, this normal approximation seems to hold only when $\alpha\ll1$ (so that $\sigma^2$ is small). Here's a similar plot to the above, this time with $\alpha=1/2$:

poor approximation!

More study needed!