What is the variance of this random variable: number of items

166 Views Asked by At

Let us assume that we have a capacity $n$ which tends to infinity.

We have an infinite number of random variables $X_1, X_2, \dotsc$, where each $X_i$ is independent and identically distributed with mean $\mu >0$ and variance $\sigma^2$. And we have that $P[X_i \geq 0] = 1$.

Now we are interested in the number of items $X_i$ we can take until the capacity $n$ is used.

That is, we want to find a description for the random variable $I$ such that $\sum_{i=1}^I X_i \leq n$ and $\sum_{i=1}^{I+1} X_i > n$.

Now I would like to compute $E[I]$ and $Var[I]$ and I am not sure how to do that, as I am not sure whether it is that easy to calculate $E[I^2]$.

First I thought that the mean would be easy to calculate as $E[I]=\frac{n}{\mu}$, but obviously this is only a guess.

Then I thought one could calculate it using conditional expectations with $E[\sum_{i=1}^I X_i ] = E[I] \mu$, which is also not the case since the random variables $I$ and $X_i$ are not independent.

Thanks to Did for pointing that out. His first comment shows that my first thought is wrong and his answer showed that my second thought was wrong.

This question is continued in the following post

1

There are 1 best solutions below

6
On BEST ANSWER

The identity $E(I)=n/\mu$ and its more cautious variants in revised versions of the question are far from being obvious, in fact they are wrong in general, hence we first try to eviscerate the misconceptions attached to them.

Recall that $n\gt0$ is fixed and that $I+1=\inf\{k\geqslant0\mid S_k\gt n\}$ where $S_k=X_1+\cdots+X_k$ for every $k\geqslant1$ and $S_0=0$, equivalently, $I=\sup\{k\geqslant0\mid\forall0\leqslant i\leqslant k, S_i\leqslant n\}$. Thus, $$S_I=\sum_{k=1}^IX_k=\sum_{k=1}^\infty X_k\mathbf 1_{k\leqslant I}.$$ The first summation is more natural but using it without caution often leads to disasters, hence one can definitely recommand to use the second summation.

For example, the (revised) question states that $$E(S_I\mid I)=\mu I,$$ presenting no real proof of this step... which happens to be quite wrong. Actually, $$E(S_I\mid I)=\sum_{k=1}^\infty E(X_k\mathbf 1_{k\leqslant I}\mid I)=\sum_{k=1}^\infty E(X_k\mid I)\mathbf 1_{k\leqslant I},$$ and there is no reason why $E(X_k\mid I)$ should equal $E(X_k)$ for every $k$. As a matter of fact, $X_k$ and $I$ are not independent in general, as the definition of $I$ shows.

To be even more specific about this crucial point, note that, for every $i\geqslant0$, $[I=i]$ depends on $(X_j)_{1\leqslant j\leqslant i+1}$ hence $X_k$ and $[I=i]$ are independent for every $k\geqslant i+2$ but not (or, at least, not obviously) for every $k\leqslant i+1$. For example, $[I=0]\subseteq[X_1\gt n]$.

Getting acquainted with some renewal theory might prove useful in this context. For example, if the steps are only nonnegative, that is, if $P(X_1\geqslant0)=1$ (and this is a big "if"), one knows that $I_n/n\to1/E(X_1)$ almost surely and that $E(I_n)/n\to1/E(X_1)$ when $n\to\infty$ (but, even in this easier context, note the asymptotics, which avoid any kind of exact value for some finite $n$).