Variance of sum of random variables (ASM exam P)

684 Views Asked by At

This is a problem from Dr. Ostaszewski's actuarial exam P ASM manual. It states:

An automobile insurance company divides its policyholders into two groups: good drivers and bad drivers. For the good drivers, the amount of an average claim is 1400, with a variance of 40,000. For the bad drivers, the amount of an average claim is 2000, with a variance of 250,000. 60% of the policyholders are classified as good drivers. Calculate the variance of the amount of claim for a policyholder.

The solution states that the distribution of the random amount of the claim X can be viewed as a mixture of two weighted distributions. Then, it finds the variance of X using the variance formula $Var(X) = E[X^{2}] - E[X]^{2}$

$E[X] = 0.6\cdot E[X_{1}] + 0.4\cdot E[X_{2}] = 0.6\cdot1400 +0.4\cdot2000 = 1640$

This is fine to me. But my problem is with the next step, where the solution states

$E[X^{2}] = 0.6\cdot E[X_{1}^{2}] + 0.4\cdot E[X_{2}^{2}] $

I am confused because shouldn't $X^{2} = (0.6\cdot X_{1} +0.4\cdot X_{2})^{2}$? Why is it that in finding the second moment of X, we only square the random variables?

Thank you!

Edit: Deleted "$X = 0.6X_{1} + 0.4X_{2}$" from the solution. It is not part of the provided solutions. I typed it in on accident!

2

There are 2 best solutions below

0
On BEST ANSWER

It's important--especially in the context of the exam--to step back and think about what it means to calculate the variance in the claim amount for a randomly selected policyholder. Note here that a given policyholder can either only be "good" or "bad": there are no individual policyholders that are a mixture of $60\%$ "good" and $40\%$ "bad." Once you randomly choose a policyholder, they are a member of exactly one of the two driver categories.

Therefore, it makes no sense to define a random variable that models the claim amount as a linear combination of the random claim amounts for each group. What we actually have here is an example of a hierarchical model; that is to say, there is a second random variable, say $Y$, that models whether a randomly selected policyholder is "good" or "bad," and conditional on this random variable, the claim amount has a particular distribution. More formally, let $Y \sim \operatorname{Bernoulli}(p = 0.6)$, so that $Y = 1$ is the outcome that a randomly chosen policyholder is "good." Then, with $X$ being the claim amount as before, we have $$\operatorname{E}[X \mid Y = 1] = 1400, \quad \operatorname{Var}[X \mid Y = 1] = 40000; \\ \operatorname{E}[X \mid Y = 0] = 2000, \quad \operatorname{Var}[X \mid Y = 0] = 250000.$$ We are then asked to determine $$\operatorname{Var}[X],$$ the unconditional variance of the claim amount of a randomly selected policyholder. To this end, $$\operatorname{E}[X^2 \mid Y] = \operatorname{Var}[X \mid Y] + \operatorname{E}[X \mid Y]^2,$$ so we compute $$\operatorname{E}[X^2 \mid Y = 1] = 40000 + 1400^2 = 2 \times 10^6, \\ \operatorname{E}[X^2 \mid Y = 0] = 250000 + 2000^2 = 4.25 \times 10^6. $$ Then by the law of total expectation, $$\begin{align*} \operatorname{E}[X^2] &= \operatorname{E}[X^2 \mid Y = 0] \Pr[Y = 0] + \operatorname{E}[X^2 \mid Y = 1] \Pr[Y = 1] \\ &= (2(0.6) + 4.25(0.4)) \times 10^6 \\ &= 2.9 \times 10^6, \end{align*}$$ and the rest is straightforward.

Why did we have to go through the second moment rather than compute the total variance directly? This is because it is not generally true that $$\operatorname{Var}[X] = \operatorname{E}[\operatorname{Var}[X \mid Y]].$$ Instead, the law of total variance is $$\operatorname{Var}[X] = \operatorname{E}[\operatorname{Var}[X \mid Y]] + \operatorname{Var}[\operatorname{E}[X \mid Y]].$$ You could use this formula to calculate the result, but it is essentially no different than what we've done above.

1
On

No, $X$ is not a linear combination of the conditioned random variables.   $X\neq 0.6 X_1 + 0.4 X_2$

Your text is just using the notational shorthand: $~\mathsf E(X_i) = \mathsf E(X\mid C_i)~$, $~\mathsf E(X^2_i)=\mathsf E(X^2\mid C_i)~$, $~\mathsf{Var}(X_i)=\mathsf{Var}(X\mid C_i)~$.   Where, we let $C_i$ be the event that a driver is in category-$i$; in this case category-1 is "good drivers", et cetera.   Then the Law of Total Expectation states:

$$\mathsf E(g(X)) ~=~ \mathsf P(C_1)~\mathsf E(g(X)\mid C_1)+\mathsf P(C_2)~\mathsf E(g(X)\mid C_2)$$

So here we have:

$$\begin{align}\mathsf E(X) ~=~& 0.6~\mathsf E(X\mid C_1)+0.4~\mathsf E(X\mid C_2) \\[1ex] =~& 0.6~\mathsf E(X_1)+0.4~\mathsf E(X_2) \\[2ex] \mathsf E(X^2) ~=~& 0.6~\mathsf E(X^2\mid C_1)+0.4~\mathsf E(X^2\mid C_2) \\ ~=~& 0.6~\mathsf E(X_1^2)+0.4~\mathsf E(X_2^2) \\ ~=~& 0.6~\big(\mathsf {Var}(X_1)+\mathsf E(X_1)^2\big) + 0.4~\big(\mathsf {Var}(X_2)+\mathsf E(X_2)^2\big) \end{align}$$

Put them together to find $\mathsf {Var}(X)$.

$\Box$


tl;dr No, because $X$ is not a linear combination of the two variables. $X_1$ is the measure of a claim under condition that the driver is good.   $X_2$ that under condition of the driver being bad.   $X$ is the unconditional measure, which is not the weighted sum of the conditional measures.   $X\neq 0.6 X_1 + 0.4 X_2$

$\blacksquare$