Least squares ANOVA error

58 Views Asked by At

$$\sum \varepsilon'\varepsilon = (y - X\beta)'(y-X\beta)$$

To estimate $\beta$, I understand how to proceed but I am confused as to what $'$ is, and why the error term least squares is equal to $\sum\varepsilon'\varepsilon$, where $y$ is the vector of observations, $X$ is an $n\times k$ design matrix, $\beta$ is vector of parameters, and the error term is observation minus the $\text{parameter}\times x$ term.

3

There are 3 best solutions below

2
On

The prime notation is in this context is the vector/matrix transpose i.e. For a vector $$ v = \left(\begin{matrix} a \\ b \\ c \end{matrix}\right) $$ then $$ v' = \left(a\,\, b \,\,c\right) $$ Notice how I have gone from a coloumn to a row vector. For a matrix $$ A = \left[\begin{matrix} a & b \\ c & d \end{matrix}\right] $$ then $$ A' = \left[\begin{matrix} a & c\\ b & d \end{matrix}\right] $$ In this case I have reflecting about the diagonal line. Take a look at the $n\times m$ case instead of $n\times n$ as above.

0
On

It is easier to expand the brackets before the derivation \begin{align} \epsilon'\epsilon ={} & (y-Xb)'(y-Xb) = y'y - y'X\beta-b'X'y+\beta'X'X\beta\\ ={} & y'y-2\beta'X'y+\beta'X'X\beta \end{align} then \begin{align} \frac{\partial}{\partial \beta}S(\beta) = -2X'y+2X'X\beta. \end{align} Lets check it term by term. So clearly $y'y$ do not depend on $\beta$, thus its derivative is $0$. Next term is $\beta'X'y$. Perform the matrix multiplications and you'll get $$ \beta'X'y = \sum_{j=0}^p\beta_j \sum_{i=1}^n y_i x_{ij},\,\,\, x_{i0}=1, \forall i $$ It's a function of $\beta$ that you have take its derivative w.r.t to all components of $\beta$ vector, i.e., you get a gradient where the $j$th entry is $ \sum_{i=1}^n y_i x_{ij}$, $$ \left(\sum_{i=1}^n y_i, \sum_{i=1}^n y_i x_{i2},\ldots,\sum_{i=1}^n y_i x_{ip} \right) = X'y. $$ For the last term of $\beta'X'X\beta$, note that this is a real quadratic form of $\beta$, as such $$ \beta' X'X\beta = \sum_{i,j=0}^p(X'X)_{ij}\beta_i\beta_j, $$ because $X'X$ is a symmetric matrix, you can write it $$ \sum_{i,j=0}^p(X'X)_{ij}\beta_i\beta_j = 2\sum_{i\ge j}^p (X'X)_{ij} \beta_i \beta_j, $$
hence similarly taking the derivative w.r.t $\beta$ you get a gradient where the $j$th entry is $$ 2\sum_{i=0}^p(X'X)_{ij}\beta_i, $$
so by converting it back to matrix form, you get the familiar and much more elegant expression $2X'X\beta$.

0
On

The expression $A'$ denotes the transpose of the matrix $A$.

Thus $$ \varepsilon'\varepsilon = \begin{bmatrix} \varepsilon_1 & \cdots & \varepsilon_n \end{bmatrix} \begin{bmatrix} \varepsilon_1 \\ \vdots \\ \varepsilon_n \end{bmatrix} = \sum_{i=1}^n \varepsilon_i^2 = \text{sum of squares of errors}. \tag 1 $$ The $\text{“}{\sum}\text{''}$ in what you posted should not be there.

Conventionally one denotes the least-squares estimates of $\beta$ by $\widehat\beta$, and then one has: \begin{align} \varepsilon & = Y - X\beta = \text{the vector of errors} \in \mathbb R^{n\times 1}, \\[8pt] \widehat\varepsilon & = Y - X\widehat\beta = \text{the vector of residuals} \in \mathbb R^{n\times 1}. \end{align} The notation in what you posted does not properly distinguish between errors and residuals. Notice that the errors may be uncorrelated and homoscedastic, but the residuals are then correlated since the vector of residuals is constrained to be orthogonal to every column of the design matrix $X$. The thing that gets minimized in least-squares estimation is not the sum of squares of errors in $(1)$ above, but rather the sum of squares of residuals: $$ \sum_{i=1}^n \widehat\varepsilon_i^2 = \text{sum of squares of residuals}. $$