Estimate variance in a nested study design

40 Views Asked by At

I have a "mother" bacteria. I cloned it $R$ times and measured a variable $P$ on these $R$ clones and recorded the mean $\bar p$ and the variance $V$ among these clones.

Then, I made $M$ mutants ($M$ new bacterias each carrying a single new mutation) from the "mother bacteria". I cloned each of these $M$ mutants $r$ times and measured the variable $P$ again. I recorded the mean $P_m$ and variance $V_m$ of each mutant $m$ ($1 ≤ m ≤ M$).

The parameter I would like to estimate is the the mean square differences between the mean value of the "mother bacteria" (estimated as $\bar p$) and the mean values of the $M$ mutants (estimated as $\bar P_m$).

$\frac{\sum_m^M (\bar P - \bar P_m)^2}{M}$ would be a biased estimator as I have to remove the variance caused by imprecised estimation of both $\bar P_m$ and $\bar P$. I think the solution should be along the lines of

$$\frac{\sum_m^M (\bar P - \bar P_m)^2}{M} - \frac{\sum_m^M V_m}{r} - \frac{V}{R}$$

but I am not sure. Can you help me?

1

There are 1 best solutions below

0
On

Let's say the true means are $\mu$ and $\mu_m$ for $m=1,\dots,M$ for the mother bacteria and the $m^{\text{th}}$ mutation, respectively. You want to estimate $$\displaystyle\sum_{m=1}^M(\mu_m-\mu)^2=\sum_{m=1}^M\mu_m^2-2\mu\sum_{m=1}^M\mu_i+M\mu^2.$$

Let's now try to come up with unbiased estimators of each of the terms. By linearity of expectation, we will be done. Note that for any random variable $\text{Var}(X)=\mathbb{E}(X^2)-(\mathbb{E}(X))^2.$ With this in mind we define

$$V=\dfrac{1}{R-1}\displaystyle\sum_{i=1}^R(P_i-\bar{P})^2\hspace{1cm}\text{ and }\hspace{1cm}V_m=\dfrac{1}{R-1}\displaystyle\sum_{i=1}^R(P_{m,i}-\bar{P}_m)^2$$

where $P_i$ is the $i^{\text{th}}$ measurement from the mother bacteria, $i=1,\dots,R$ and $P_{m,i}$ is the $i^{\text{th}}$ measurement from the $m^{\text{th}}$ mutant, $m=1,\dots,M$ and $i=1,\dots,r.$ Also, $\bar{P}=\dfrac{1}{R}\displaystyle\sum_{i=1}^RP_i$ and $\bar{P}_m=\dfrac{1}{r}\displaystyle\sum_{i=1}^rP_{m,i}.$

Then $\mathbb{E}(V)=\text{Var}(P_{\text{mother}}),$ and if we define the estimator $T=\dfrac{1}{R}\displaystyle\sum_{i=1}^RP_i^2-V,$ we will get $\mathbb{E}(T)=\mathbb{E}(P_\text{mother}^2)-\text{Var}(P_\text{mother})=\mathbb{E}(P_\text{mother})^2=\mu^2.$

Similarly define $T_m=\dfrac{1}{r}\displaystyle\sum_{i=1}^rP_{m,i}^2-V_m$ to get $\mathbb{E}(T_m)=\mu_m^2.$ We then define the estimator

$$T_{*}=\displaystyle\sum_{m=1}^MT_m^2-2\bar{P}\sum_{m=1}^M\bar{P}_m+MT.$$

Check that by linearity of expectation we get the required result.