Variance of discrete probability distribution

811 Views Asked by At

I was wondering how I should calculate the variance of the following discrete probability distribution:

$$P(y = 0|X) = w + (1-w)e^{-\mu}$$ $$P(y = j|X) = (1-w)e^{-\mu}\mu^{y}/y! \qquad j=1,2...$$

The variance is supposed to be:

$$Var(y|w, \mu) = (1-w)[\mu + w\mu^{2}]$$

I suppose the total variance is the sum of the variance associated to the case $y=0$ and $y=j$ for $j>0$. Correct me if I'm wrong, but I don't think something like $Var(X + Y) = Var(X)+Var(Y)+Cov(X,Y)$ is appropriate because $y$ is a single random variable which takes several values with distinct probability distributions. However, the $y=0$ case doesn't depend on $y$, so $Var(y=0|w,\mu)$ should be zero. The case $y=j$ could be something like:

$$\sum_{1}^{\infty} (y-(\mu-w \mu))^{2}(1-w)e^{-\mu}\mu^{y}/y!$$

The part $\mu - w\mu$ is due to the expected value of the probability distribution $P(y=j|X)$:

$$\sum_{1}^{\infty} y(1-w)e^{-\mu}\mu^{y}/y!=\mu - w\mu$$

This expected value is correct. However, the value of the variance calculated as I described above, is not correct. It gives a much more complicated expression. Interestingly, summing from $0$ to $\infty$, produces a very similar result to the correct answer:

$$\sum_{0}^{\infty} (y-(\mu-w \mu))^{2}(1-w)e^{-\mu}\mu^{y}/y!=(1-w)\mu(1+w^{2}\mu)$$

By the way, I'm using Mathematica to obtain these results. I'm only interested in the correct approach to obtain this kind of variance, not in the details of the calculation.

1

There are 1 best solutions below

1
On BEST ANSWER

This is known as a Zero-inflated Poisson model (or ZIP). Here, $Y$ has pmf $f(y)$:

Then, $Var(Y)$ is:

where I am using the Var function from the mathStatica package for Mathematica.

As to how to calculate the variance:

There are two standard approaches to calculating variance:

  1. $Var(Y) = E[Y^2] - (E[Y])^2$
  2. $Var(Y) = E\big[(Y- E[Y])^2\big]$

If you use the FIRST approach, $Var(Y) = E[Y^2] - E[Y]^2$ ... note that the first and second moments are calculated as of form:

$$\sum _{y=0}^{\infty } (y f) \quad \text{and} \quad \sum _{y=0}^{\infty } (y^2 f)$$

Note that when $Y=0$, the expression $y^i f$ = 0, so the $Y = 0$ line of your piecewise pmf contributes 0 to both the first and second moment, and accordingly you can calculate both excluding the $Y= 0$ line, and simply summing from $y = 1$ to $\infty$.

In Mathematica (which you are using), you would enter:

Sum[y^2 f, {y, 0, Infinity}] - Sum[y f, {y, 0, Infinity}]^2   // Simplify

Summing from 1 yields the same outcome:

Sum[y^2 f, {y, 1, Infinity}] - Sum[y f, {y, 1, Infinity}]^2   // Simplify

However, if you use the SECOND APPROACH $Var(Y) = E\big[(Y- E[Y])^2\big]$ to calculating variance, then you CANNOT exclude the Y = 0 case in the summation because $(Y- E[Y])f$ is NOT equal to 0 when $Y = 0$.