Can someone please help explain how to get design matrix in one way anova?

1.8k Views Asked by At

Can someone please help me understand how to interpret the following;enter image description here

I am quite confused how the solution is obtained.

I get that part about how because it says it is a balanced design, then we know that each level has equal observations,

But what is the procedure used to construct that $X^{T}$ matrix?

Is it based off setting the baseline level to all 1?

1

There are 1 best solutions below

0
On BEST ANSWER

I apologize in advance if my experimental design terminology is incorrect; it's been a year since I've worked with this material.

The notation here is extremely confusing, unfortunately. For your own reference, this so-called "contrast parametrization" is derived from the "factor effects model," which can be found at page 7 of http://www.stat.purdue.edu/~ghobbs/STAT_512/Lecture_Notes/ANOVA/Topic_21.pdf.

To summarize the factor effects model, the assumption is that your variable of interest under the $j$th trial for level $i$ is given by $$Y_{ij} = \mu + \tau_i + \epsilon_{ij}$$ where $i = 1, 2, 3, 4$ for your problem, and the highest value of $j$ is the same for each $i$ (since the experiment is balanced). Note carefully that only means with respect to the factor level $i$ are allowed to vary, and means do not depend on $j$ - and $\epsilon_{ij}$ is usually assumed to be a normally distributed error with mean $0$ and some variance term.

Suppose we assume $j = 1, \dots, n$. Then the ANOVA model would be $$\begin{align} \begin{bmatrix} Y_{11} \\ Y_{12} \\ \vdots \\ Y_{1n} \\ Y_{21} \\ \vdots \\ Y_{2n} \\ Y_{31} \\ \vdots \\ Y_{3n} \\ Y_{41} \\ \vdots\\ Y_{4n} \end{bmatrix} = \begin{bmatrix} 1 & 1 & 0 & 0\\ 1 & 1 & 0 & 0\\ \vdots & \vdots & \vdots & \vdots \\ 1 & 1 & 0 & 0\\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \mu \\ \tau_1 \\ \tau_2 \\ \tau_3 \\ \tau_4 \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \vdots \\ \epsilon_{1n} \\ \epsilon_{21} \\ \vdots \\ \epsilon_{2n} \\ \epsilon_{31} \\ \vdots \\ \epsilon_{3n} \\ \epsilon_{41} \\ \vdots\\ \epsilon_{4n} \end{bmatrix}\end{align}$$ So you would think, okay, I think $$\begin{bmatrix} 1 & 1 & 0 & 0\\ 1 & 1 & 0 & 0\\ \vdots & \vdots & \vdots & \vdots \\ 1 & 1 & 0 & 0\\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 0 & 1 \end{bmatrix}$$

should be the design matrix. Unfortunately, you'd be wrong. One of the things that we like to happen is to be able to invert $\mathbf{X}^{T}\mathbf{X}$, which requires that $\mathbf{X}$ has full column rank. Unfortunately, $\mathbf{X}$ has rank $3$: notice the first column is the sum of the other three columns. So, by convention, the second column is dropped (if you use R, R does this too). Thus, the actual model is $$\begin{align} \begin{bmatrix} Y_{11} \\ Y_{12} \\ \vdots \\ Y_{1n} \\ Y_{21} \\ \vdots \\ Y_{2n} \\ Y_{31} \\ \vdots \\ Y_{3n} \\ Y_{41} \\ \vdots\\ Y_{4n} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0\\ 1 & 0 & 0\\ \vdots & \vdots & \vdots \\ 1 & 0 & 0\\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots\\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots\\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ \vdots & \vdots & \vdots\\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} \mu \\ \tau_2 \\ \tau_3 \\ \tau_4 \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \vdots \\ \epsilon_{1n} \\ \epsilon_{21} \\ \vdots \\ \epsilon_{2n} \\ \epsilon_{31} \\ \vdots \\ \epsilon_{3n} \\ \epsilon_{41} \\ \vdots\\ \epsilon_{4n} \end{bmatrix}\end{align}$$ This gives you the design matrix that you have up there. Consequentially, what does this mean for the mean of the subgroups? Well, since $$\mathbb{E}\left(\begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \vdots \\ \epsilon_{1n} \\ \epsilon_{21} \\ \vdots \\ \epsilon_{2n} \\ \epsilon_{31} \\ \vdots \\ \epsilon_{3n} \\ \epsilon_{41} \\ \vdots\\ \epsilon_{4n} \end{bmatrix} \right) = \mathbf{0}$$ we can ignore that part, so that we have $$\mathbb{E}\left(\begin{bmatrix} Y_{11} \\ Y_{12} \\ \vdots \\ Y_{1n} \\ Y_{21} \\ \vdots \\ Y_{2n} \\ Y_{31} \\ \vdots \\ Y_{3n} \\ Y_{41} \\ \vdots\\ Y_{4n} \end{bmatrix}\right) = \begin{bmatrix} 1 & 0 & 0\\ 1 & 0 & 0\\ \vdots & \vdots & \vdots \\ 1 & 0 & 0\\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots\\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots\\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ \vdots & \vdots & \vdots\\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} \mu \\ \tau_2 \\ \tau_3 \\ \tau_4 \end{bmatrix} = \begin{bmatrix} \mu \\ \mu \\ \vdots \\ \mu \\ \mu+ \tau_2 \\ \vdots \\ \mu + \tau_2 \\ \mu + \tau_3 \\ \vdots \\ \mu + \tau_3 \\ \mu + \tau_4 \\ \vdots \\ \mu + \tau_4 \end{bmatrix}$$ Thus, letting $\mu_i$ be the mean of the $i$th factor, $\mu_1 = \mu$, $\mu_2 = \mu + \tau_2$, $\mu_3 = \mu + \tau_3$, and $\mu_4 = \mu + \tau_4$. Notice that $\mu = \beta_0$, $\tau_2 = \beta_1$, $\tau_3 = \beta_2$, and $\tau_4 = \beta_3$ in your example.

It's unfortunate that notation tends to not be consistent among texts, but I hope this clears this up.