Showing Sum of Normal Random Variables More than Additive

107 Views Asked by At

Hopefully this is a quick question. I am not a statistician, so I would like to just make sure that I am approaching this problem correctly. I have some data that I am working with which is meant to monitor cell death and I am trying to compare the effects of different treatments on the cells. Since the research is proprietary, I cannot give the full details here, but I will give an example.

I am monitoring cell death in response to stimuli. My control group contains 39 samples and is normally distributed with a mean death percentage of 0.05 and a variance $\sigma^2$ of 0.001. My first sample of size 37 is given treatment X has an approximately normal distribution with a mean death percentage of 0.1 and a variance $\sigma^2$ of 0.04. Likewise, the 43 cells given treatment Y have a death percentage of 0.18 and variance of 0.09. Lastly, the sample containing treatments X and Y with size 34 has a mean death percentage of 0.54 with a variance of 0.16. My question is then how do I show that the effect of treatments X and Y together is more than what would be expected from adding the random variables?

Since I do not know that the variables are guaranteed to be independent (in fact, their effects seem to be very much not independent), my idea was to use the formulas

$$E[X + Y] = E[X] + E[Y]$$

and

$$Var[X + Y] = Var[X] + Var[Y] + 2Cov[X,Y]$$

to obtain the normal distribution for the sum of the random variables where I believe that the covariance of samples would be computed via

$$Cov[X,Y] = \frac{\left(\sum\limits_{i = 1}^{N_x}(x_i - \mu_x)\right) \left(\sum\limits_{j=1}^{N_y}(y_j - \mu_y)\right)}{(N_x - 1)(N_y - 1)}$$

where the loss of 1 in each of the terms in the denominator comes from the loss of 1 d.o.f. from using each of the means (I am not sure if this is the correct formula as I could not find a reference for two different random variables, but it would make sense to me given the situation). Then, my idea was to construct two 95% confidence intervals -- one for my sample and one for the computed $X + Y$ random variable, and, provided that the confidence intervals did not overlap, I believe that I can reject the assumption that the effect of the two random variables is additive. Is this the correct approach, or have I lost my mind here? Thank you in advance for any help that you can give!

1

There are 1 best solutions below

3
On BEST ANSWER

I am not sure if I understood your setup correctly, but I believe that you want to test the null hypothesis $\mu_Z = \mu_X + \mu_Y$ agains the alternative $\mu_Z \neq \mu_X + \mu_Y$. Here $\mu_Z$ denotes the mean of the joint sample.

Let $$\begin{pmatrix} \hat\mu_X \\ \hat\mu_Y \\ \hat\mu_Z \end{pmatrix} := \begin{pmatrix} n_X^{-1}\sum_{i=1}^{n_X}X_i \\ n_Y^{-1}\sum_{i=1}^{n_Y}Y_i \\n_Z^{-1}\sum_{i=1}^{n_Z}Z_i\end{pmatrix},$$ where $X_i$, $Y_i$ and $Z_i$ denote the $i$th value in the $X$ sample, the $Y$ sample, and the joint sample, respectively. Then, by the central limit theorem $$\begin{pmatrix}\frac{1}{\sqrt{n_X}}\sum_{i=1}^{n_X}(X_i - \mu_X) \\ \frac{1}{\sqrt{n_Y}}\sum_{i=1}^{n_Y}(Y_i - \mu_Y)\\\frac{1}{\sqrt{n_X}}\sum_{i=1}^{n_Z}(Z_i - \mu_Z) \end{pmatrix}$$ is approximately normal with mean zero and variance-covariance matrix $$\begin{pmatrix} \sigma_X^2 & \sigma_{XY} & \sigma_{XZ} \\ \sigma_{XY} & \sigma_Y^2 & \sigma_{YZ} \\ \sigma_{XZ} & \sigma_{YZ} & \sigma_Z^2 \end{pmatrix}.$$ The main diagnonal elements are given by $\tau_I^2$ for $I\in\{X,Y,Z\}$, where $\tau_I^2$ is the variance of the data in the $I$th group. For the off-diagonal elements $\sigma_{IJ}$ we have $$\operatorname{Cov}\left[\frac{1}{\sqrt{n_I}}\sum_{i=1}^{n_I}(I_i - \mu_I), \frac{1}{\sqrt{n_J}}\sum_{j=1}^{n_J}(J_j - \mu_J)\right] = \sqrt{n_In_J}\operatorname{Cov}[I_i - \mu_I,J_j - \mu_J] = \sqrt{n_In_J}\sigma_I\sigma_J\rho_{IJ}.$$ Estimating the variances on the main diagonal is no problem. To estimate the correlations on the off-diagonal elements, you either need additional information or estimate it from the data (if that's possible: in that case, an estimator of the covariance between $I$ and $J$ would be $$\frac{1}{n - 1}\sum_{i=1}^n(I_i - \mu_i)(J_i - \mu_J),$$ where $n = \min(n_I,n_J)$. But this is only possible if for each row $i$ in your data you find the value $I_i$ and $J_i$).

Once you have an estimator $\hat\Sigma$ for the variance covariance matrix, You could compare the value of $\vert\hat\mu_Z - \hat\mu_X - \hat\mu_Y\vert$ against the $(1-\alpha)$ qantile of a normal distribution with mean zero and variance $c'\hat\Sigma c$, where $$c = \begin{pmatrix} -1 \\ -1 \\ 1 \end{pmatrix}.$$