Expected Value and Variance for a Probability Function

134 Views Asked by At

The annual number of claims on a given policy follows a distribution whose probability function is:

P(K=k) = $(\frac{D}{1+D})^k$ * ($\frac{1}{1+D})$

1) Your company has sold 300 policies in total

2) 200 policies have D = 2, and 100 policies have D = 5

What are the expected number of claims for the D = 2 policies as a group? D = 5?

What are the variances for each?

I am not sure how to start with this one. I don't know what the D's mean so this question isn't easily interpreted for me. I plugged in D = 2 and k = 200 into the equation and got a really low number.

EDIT: I have that this is discrete and the expected value of a discrete random variable X is $\sum$xf(x). But what is x and how do I obtain the probability for D = 2? What goes into the function? There are two possibilities: D = 2 or 5. Shouldn't these two probabilities add up to 1?

1

There are 1 best solutions below

29
On BEST ANSWER

This is the geometric distribution with $p={1 \over 1+D}$. The Wikipedia page has many relevant details.

First, let us find the formulae for one claim, then for $m$ claims and finally the computations.


For one claim:

For general $D$, we have $\overline{K}=EK = \sum_{i=0}^\infty i \cdot p[K=i] = \sum_{i=0}^\infty i ({D \over 1+D})^i {1 \over 1+D } = D$.

We have $\operatorname{var} K = E(K-\overline{K})^2 = E K^2 - (E K)^2$, and $E K^2 = \sum_{i=0}^\infty i^2 ({D \over 1+D})^i {1 \over 1+D } = D(1+2D)$, and so $\operatorname{var} K = D(1+D)$.


For $m$ policies:

Hence, for general $D$, if we have $m$ policies, then $E[\sum_i K_i] = m \overline{K} = m D$.

The variance of $m$ policies is slightly more complicated, to make life simple, I am assuming that the claims for each policy are independent. In particular, this means that for $i \neq j$, we have $E[K_i K_j] = (E K_i )(E K_j)$.

Then the computation is as follows \begin{eqnarray} \operatorname{var} (\sum_i K_i) &=& E(\sum_i (K_i - \overline{K}_i))^2 \\ &=& = E(\sum_i K_i)^2 - (E[\sum_i K_i])^2 \\ &=& E [ (\sum_i K_i) (\sum_j K_j) ] - m^2 \overline{K}^2 \\ &=& \sum_i \sum_j E [K_i K_j] - m^2 \overline{K}^2 \\ &=& \sum_i E [K_i^2] + \sum_i \sum_{j \neq i} E [K_i K_j] - m^2 \overline{K}^2 \\ &=& m E [K^2] + \sum_i \sum_{j \neq i} E [K_i] E[ K_j] - m^2 \overline{K}^2 \\ &=& m E [K^2] + (m^2-m) \overline{K}^2 - m^2 \overline{K}^2 \\ &=& m \operatorname{var} K \\ &=& m D (1+D) \end{eqnarray}


Finally, the computations:

So, for $D=2$, $m=200$, we have $E[\sum_i K_i] = 200 \cdot 2 = 400$ and $\operatorname{var} (\sum_i K_i) = 200 \cdot 2 (1+2) = 1200$.

So, for $D=5$, $m=100$, we have $E[\sum_i K_i] = 100 \cdot 5 = 500$ and $\operatorname{var} (\sum_i K_i) = 100 \cdot 5 (1+5) = 3000$.


Regarding you question about confidence intervals:

I am not sure how you want to model this one. One way is to have three sets of independent variables; $X_k$ (whose statistics correspond to $D=2$), $Y_k$ (whose statistics correspond to $D=5$) and a choice variable $\sigma_k$ that takes the value $1$ with probability $p={2 \over 3}$ (since two thirds of the claims are of type $D=2$) and value $0$ otherwise.

Then we model each (of many) claims as $Z_k = \sigma_k X_k + (1-\sigma_k) Y_k$.

A quick computation shows $E Z_k = p E X_k + (1-p)E Y_k$ and $E Z_k^2 = p E X_k^2 + (1-p)E Y_k^2$.

Calculating shows $E Z_k = {2 \over 3} 2 + {1 \over 3} 5 = 3$, $E Z_k^2 = {2 \over 3} 2\cdot 5 + {1 \over 3} 5 \cdot 11 = 25$, and so $\operatorname{var} Z_k = 16$ (and hence $\sigma_{Z_k} = 4$).

Then we model ${ {1 \over n} \sum_{k=0}^n Z_k - EZ_k \over { \sigma_{Z_k} \over \sqrt{n} } }$ as ${\cal N} (0,1)$.

The $95 \%$ bounds correspond to $EZ_k \pm 1.96 {\sigma_{Z_k} \over \sqrt{n} }$, or roughly $(2.55,3.45)$.