Variance of conditional discrete random variables in a loss distribution model

113 Views Asked by At

I'm wondering how to find the variance of a conditional discrete random variable. For the example, suppose an insurance company could have 0, 1, 2 or 3 independent losses in a given period with the following probability:

x | P(X=x)
0 | .25
1 | .30
2 | .35
3 | .10

And each loss could be 500, 1500 or 2500 dollar with the following probability:

y | P(Y=y)
500 | .60
1500 | .30
2500 | .10

What would be the variance of of the total loss in a period? Finding the expected value is easy since $$E[A+B] = E[A] + E[B]$$ so by finding the E[Y]=1000, the expected total value is $$\sum_{x=0}^3x*E[Y]*P(X=x)$$ Similarily, $$Var(A+B) = Var(A) + Var(B) + 2Cov(A,B)$$ but since each loss is independent, $Cov(Y,Y)=0$ so $$Var(Y+Y) = Var(Y) + Var(Y)$$ so I thought the total variance would be $$\sum_{x=0}^3x*Var[Y]*P(X=x)$$ However, this conflicts with the result I got from doing a simulation of the same problem. I calculated the expected value of 1300 with variance 585,000, but the simulation had an average 1299 and variance 1,310,276. Clearly I'm doing something wrong.

1

There are 1 best solutions below

0
On

Your final expression is not all of the variance.

Your $X$ has mean $1.3$ and variance $0.91$; your $Y$ has mean $1000$ and variance $450000$

So the expected total loss is $E[Z]=E[X]E[Y]=1300$ as you say

Use the law of total variance $$\operatorname{Var}(Z)=\operatorname{E}[\operatorname{Var}(Z\mid X)] + \operatorname{Var}(\operatorname{E}[Z\mid X])$$

$$\operatorname{E}[\operatorname{Var}(Z\mid X)] = 1.3 \times 450000 = 585000$$

$$\operatorname{Var}(\operatorname{E}[Z\mid X]) = 1000^2 \times 0.91 = 910000$$

$$\operatorname{Var}(Z) = 585000 + 910000 = 1495000$$ which is closer to your simulation

My simulation in R gives

set.seed(2020)
cases <- 10^6
X <- sample(c(0,1,2,3), size=cases, replace=TRUE, prob=c(0.25,0.3,0.35,0.1))
Y <- matrix(sample(c(500,1500,2500), size=cases*3, replace=TRUE, 
                   prob=c(0.6,0.3,0.1)), ncol=3)
Z <- ifelse(X==3, Y[,1]+Y[,2]+Y[,3], 
            ifelse(X==2, Y[,1]+Y[,2],
                   ifelse(X==1, Y[,1], 0)))
mean(Z)
# 1299.659
var(Z) 
# 1496062

which is even closer