Two different approaches to find expectation yield different results

76 Views Asked by At

Recently I was working on this probability problem:

In a factory, the number of accidents that happen every week is a random variable (say $N$), that has a mean of $\mu_{N}$ and a variance of $\sigma^2_{N}$. The number of the workers that are injured in each accident is also a random variable with mean $\mu_X$ and variance $\sigma^2_X$. If all these random variables are independent of each other, what is the expected value and the variance of "the number of workers that get injured in a week".

This is one approach to solve this problem:

Take $X_i$ as the number of workers that get injured in the $i$th accident during a week. then what we want will be $E(\sum_{i=1}^{N}X_i)$ and $Var(\sum_{i=1}^{N}X_i)$. We can compute these two using conditional expectation and variance:

$$E(\sum_{i=1}^{N}X_i)=E_N(E(\sum_{i=1}^{N} X_i|N))=E_N(N\mu_x)=\mu_N\mu_X$$

$$Var(\sum_{i=1}^{N}X_i)=Var_N(E(\sum_{i=1}^{N} X_i|N))+E_N(Var(\sum_{i=1}^{N} X_i|N))=\mu_X^2\sigma^2_N+\mu_N\sigma^2_X$$

But one might think of another approach to solve this problem. Let $X$ be the number of injuries in each accident. The number of injuries in a week can be expressed as $Y=NX$. Then, because of independence we can write:

$$E(Y)=E(NX)=E(N)E(X)=\mu_N\mu_X$$

Well, everything is good till now, but:

$$E(X^2)=Var(X)+E^2(X)=\sigma^2_X+\mu^2_X$$ $$E(N^2)=Var(N)+E^2(N)=\sigma^2_N+\mu^2_N$$ $$\Rightarrow Var(Y)=E(Y^2)-E^2(Y)=E(N^2X^2)-E^2(NX)=E(N^2)E(X^2)-E^2(N)E^2(X)$$ $$\Rightarrow Var(Y)=(\sigma^2_N+\mu^2_N)(\sigma^2_X+\mu^2_X) - \mu^2_N\mu^2_X=\sigma^2_N\sigma^2_X+\sigma^2_N\mu^2_X+\sigma^2_X\mu^2_N$$

And this result is not the same with the result of the previous approach. Which approach is the wrong one? And why?

2

There are 2 best solutions below

2
On BEST ANSWER

Let $Y_1 := \sum_{i=1}^N X_i$ be the "sum" approach, and $Y_2:=NX$ be the "product" approach.

The "sum" approach is more accurate because you're interested in "the number of workers that get injured in a week", which is the sum of workers that are injured in each accident.

It is possible to use a product-based approach like $Y_3:=N\bar{X}$, but then you need to use the average, $\bar{X}=\frac{1}{N}\sum_{i=1}^N X_i$, and now you're back to using the "sum" approach.

Variance difference

It's quite natural that the "product" approach gives a higher variance than the "sum" approach.

In the "sum" approach, you treat each accident as independent of a different accident. However, in the "product" approach, you assume that each accident is equal to a different accident.

In the "sum" approach, intuitively, the variance is lower than that of the "product" approach, because a higher casualty rate in one accident might be offset by a lower casualty rate in a different accident. You see this straightforwardly in the example that, for i.i.d. $X_1, X_2$, we have $$Var(X_1+X_2) = 2Var(X_1) \le 4Var(X_1) = Var(2X_1) = Var(X_1+X_1).$$

0
On

They are different cases:
In the sum approach we are considering the number of injured in each accident may be different.
In the product approach, we are considering the number of injured in each accident are the same.
The sum approach seems the most sensible.