Can someone please help me understand how to interpret the following;
I am quite confused how the solution is obtained.
I get that part about how because it says it is a balanced design, then we know that each level has equal observations,
But what is the procedure used to construct that $X^{T}$ matrix?
Is it based off setting the baseline level to all 1?
I apologize in advance if my experimental design terminology is incorrect; it's been a year since I've worked with this material.
The notation here is extremely confusing, unfortunately. For your own reference, this so-called "contrast parametrization" is derived from the "factor effects model," which can be found at page 7 of http://www.stat.purdue.edu/~ghobbs/STAT_512/Lecture_Notes/ANOVA/Topic_21.pdf.
To summarize the factor effects model, the assumption is that your variable of interest under the $j$th trial for level $i$ is given by $$Y_{ij} = \mu + \tau_i + \epsilon_{ij}$$ where $i = 1, 2, 3, 4$ for your problem, and the highest value of $j$ is the same for each $i$ (since the experiment is balanced). Note carefully that only means with respect to the factor level $i$ are allowed to vary, and means do not depend on $j$ - and $\epsilon_{ij}$ is usually assumed to be a normally distributed error with mean $0$ and some variance term.
Suppose we assume $j = 1, \dots, n$. Then the ANOVA model would be $$\begin{align} \begin{bmatrix} Y_{11} \\ Y_{12} \\ \vdots \\ Y_{1n} \\ Y_{21} \\ \vdots \\ Y_{2n} \\ Y_{31} \\ \vdots \\ Y_{3n} \\ Y_{41} \\ \vdots\\ Y_{4n} \end{bmatrix} = \begin{bmatrix} 1 & 1 & 0 & 0\\ 1 & 1 & 0 & 0\\ \vdots & \vdots & \vdots & \vdots \\ 1 & 1 & 0 & 0\\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \mu \\ \tau_1 \\ \tau_2 \\ \tau_3 \\ \tau_4 \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \vdots \\ \epsilon_{1n} \\ \epsilon_{21} \\ \vdots \\ \epsilon_{2n} \\ \epsilon_{31} \\ \vdots \\ \epsilon_{3n} \\ \epsilon_{41} \\ \vdots\\ \epsilon_{4n} \end{bmatrix}\end{align}$$ So you would think, okay, I think $$\begin{bmatrix} 1 & 1 & 0 & 0\\ 1 & 1 & 0 & 0\\ \vdots & \vdots & \vdots & \vdots \\ 1 & 1 & 0 & 0\\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ \vdots & \vdots & \vdots & \vdots\\ 1 & 0 & 0 & 1 \end{bmatrix}$$
should be the design matrix. Unfortunately, you'd be wrong. One of the things that we like to happen is to be able to invert $\mathbf{X}^{T}\mathbf{X}$, which requires that $\mathbf{X}$ has full column rank. Unfortunately, $\mathbf{X}$ has rank $3$: notice the first column is the sum of the other three columns. So, by convention, the second column is dropped (if you use
R,Rdoes this too). Thus, the actual model is $$\begin{align} \begin{bmatrix} Y_{11} \\ Y_{12} \\ \vdots \\ Y_{1n} \\ Y_{21} \\ \vdots \\ Y_{2n} \\ Y_{31} \\ \vdots \\ Y_{3n} \\ Y_{41} \\ \vdots\\ Y_{4n} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0\\ 1 & 0 & 0\\ \vdots & \vdots & \vdots \\ 1 & 0 & 0\\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots\\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots\\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ \vdots & \vdots & \vdots\\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} \mu \\ \tau_2 \\ \tau_3 \\ \tau_4 \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \vdots \\ \epsilon_{1n} \\ \epsilon_{21} \\ \vdots \\ \epsilon_{2n} \\ \epsilon_{31} \\ \vdots \\ \epsilon_{3n} \\ \epsilon_{41} \\ \vdots\\ \epsilon_{4n} \end{bmatrix}\end{align}$$ This gives you the design matrix that you have up there. Consequentially, what does this mean for the mean of the subgroups? Well, since $$\mathbb{E}\left(\begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \vdots \\ \epsilon_{1n} \\ \epsilon_{21} \\ \vdots \\ \epsilon_{2n} \\ \epsilon_{31} \\ \vdots \\ \epsilon_{3n} \\ \epsilon_{41} \\ \vdots\\ \epsilon_{4n} \end{bmatrix} \right) = \mathbf{0}$$ we can ignore that part, so that we have $$\mathbb{E}\left(\begin{bmatrix} Y_{11} \\ Y_{12} \\ \vdots \\ Y_{1n} \\ Y_{21} \\ \vdots \\ Y_{2n} \\ Y_{31} \\ \vdots \\ Y_{3n} \\ Y_{41} \\ \vdots\\ Y_{4n} \end{bmatrix}\right) = \begin{bmatrix} 1 & 0 & 0\\ 1 & 0 & 0\\ \vdots & \vdots & \vdots \\ 1 & 0 & 0\\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots\\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots\\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ \vdots & \vdots & \vdots\\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} \mu \\ \tau_2 \\ \tau_3 \\ \tau_4 \end{bmatrix} = \begin{bmatrix} \mu \\ \mu \\ \vdots \\ \mu \\ \mu+ \tau_2 \\ \vdots \\ \mu + \tau_2 \\ \mu + \tau_3 \\ \vdots \\ \mu + \tau_3 \\ \mu + \tau_4 \\ \vdots \\ \mu + \tau_4 \end{bmatrix}$$ Thus, letting $\mu_i$ be the mean of the $i$th factor, $\mu_1 = \mu$, $\mu_2 = \mu + \tau_2$, $\mu_3 = \mu + \tau_3$, and $\mu_4 = \mu + \tau_4$. Notice that $\mu = \beta_0$, $\tau_2 = \beta_1$, $\tau_3 = \beta_2$, and $\tau_4 = \beta_3$ in your example.It's unfortunate that notation tends to not be consistent among texts, but I hope this clears this up.