Independence of Bernoulli random variables defined by a uniform random number

66 Views Asked by At

Suppose we choose a random number $b$ from the interval $[0,1]$ with a uniform distribution. Based on this number, we define the following random variables: $$X_n=\begin{cases} 1 & b\in \left(\frac{1}{2n},\frac{1}{2n}+\frac{1}{3}\right)\\ 0 & \text{otherwise} \end{cases}$$

These random variables are Bernoulli random variables with an expected value of $\frac{1}{3}$. Are they independent? Explain your answer.

I think they are not independent. For example, if we know that $X_1=1$, then we can be sure that $X_4=0$. So they are dependent on each other. But, when calculating $\operatorname{var}(X_1+X_2+\dots+X_n)$, I noticed that my instructor used $n\operatorname{var}(X_1)$. How did he get this result, using covariance or something else? How?

I appreciate any help!

1

There are 1 best solutions below

0
On

The $X_i$ are dependent if they are realized from the same $b$. So for instance, if you generate a single $b$ and then compute the entire vector $(X_1, X_2, \ldots, X_n)$ from the same $b$, then the components $X_i$ are dependent.

However, if each $X_i$ is generated from their own $b_i$, then they will be independent, since $X_i$ is a deterministic function of its own $b_i$, and if the $b_i$ are independent, then so are the $X_i$. So the question of independence relates to how the $X_i$ are generated. The way your problem is worded would suggest that the former procedure is used, thus your assertion that they are dependent is consistent with the wording as you have provided.

It is important to remember that the $X_i$ inherit their randomness from the $b$ that generated it. So we may define another random variable $$Y_n = \sum_{i=1}^n X_i = \sum_{i=1}^n \mathbb 1 \left(\tfrac{1}{2i} < b < \tfrac{1}{2i} + \tfrac{1}{3}\right).$$ Its expectation is simply $$\operatorname{E}[Y_n] = \sum_{i=1}^n \Pr\left[\tfrac{1}{2i} < b < \tfrac{1}{2i} + \tfrac{1}{3}\right] = \frac{n}{3}.$$ Next, we calculate $$\begin{align} \operatorname{E}[Y_n^2] &= \operatorname{E}\left[\left(\sum_{i=1}^n X_i \right)^2 \right] \\ &= \operatorname{E}\left[\sum_{i=1}^n \sum_{j=1}^n X_i X_j \right] \\ &= \sum_{i=1}^n \sum_{j=1}^n \operatorname{E}[X_i X_j] \\ &= \sum_{i=1}^n \operatorname{E}[X_i^2] + 2 \!\!\! \sum_{1 \le i < j \le n} \operatorname{E}[X_i X_j]. \end{align}$$ Note that $X_i^2 = X_i$. Hence in this case, we simply have $\operatorname{E}[X_i^2] = \frac{1}{3}$. In the case where $i < j$, $$\begin{align} \operatorname{E}[X_i X_j] &= \Pr\left[(\tfrac{1}{2i} < b < \tfrac{1}{2i} + \tfrac{1}{3}) \cap (\tfrac{1}{2j} < b < \tfrac{1}{2j} + \tfrac{1}{3})\right] \\ &= \Pr\left[\tfrac{1}{2i} < b < \tfrac{1}{2j} + \tfrac{1}{3}\right] \\ &= \max\left(0, \frac{1}{3} + \frac{1}{2j} - \frac{1}{2i}\right). \end{align}$$ Therefore, $$\begin{align} \operatorname{E}[Y_n^2] &= \frac{n}{3} + 2 \!\! \sum_{1 \le i < j \le n} \frac{1}{3} + \max \left(-\frac{1}{3}, \frac{1}{2j} - \frac{1}{2i}\right) \\ &= \frac{n^2}{3} - S(n), \end{align}$$ where $$S(n) = \sum_{1 \le i < j \le n} \min \left( \frac{2}{3}, \frac{1}{i} - \frac{1}{j} \right) > 0.$$ Hence the variance is $$\operatorname{Var}[Y_n] = \frac{n^2}{3} - S(n) - \frac{n^2}{9} = \frac{2n^2}{9} - S(n).$$ Had we assumed independence, we would have obtained $$n \operatorname{Var}[X_1] = n \left( \operatorname{E}[X_1^2] - \operatorname{E}[X_1]^2\right) = n \operatorname{E}[X_1] \left(1 - \operatorname{E}[X_1]\right) = \frac{2}{9}n.$$

Simulation of the first scenario (single $b$ for all $X_i$) may be performed in Mathematica as follows for the case $n = 5$ and $m = 10^7$ simulations:

Variance[ParallelTable[Sum[Boole[1/(2i) < # < 1/(2i) + 1/3], {i, 1, 5}]&[RandomReal[]], {10^7}]]

This yielded $2.07316$ for my run, which closely matches the theoretical value obtained with

V[n_] := 2n^2/9 - Sum[Min[2/3, 1/i - 1/j], {i, 1, n - 1}, {j, i + 1, n}]

which for $n = 5$ yields $\operatorname{Var}[Y_5] = \frac{373}{180} \approx 2.07222$. Note that $(2/9)(5) = 10/9$ is way off.

Now, what if we simulated the second scenario? This could be done with just a minor change in the code:

Variance[ParallelTable[Sum[Boole[1/(2i) < RandomReal[] < 1/(2i) + 1/3], {i, 1, 5}], {10^7}]]

This gave me approximately $1.11099$, which is now consistent with the theoretical $2n/9$. Consequently, you will want to confirm which interpretation of the problem was intended. An argument in favor of the second interpretation is that $S(n)$ has no elementary closed form.