Showing Residual Sum of Squares for Multiple Linear Regression is 0

973 Views Asked by At

Problem:

I have the linear regression model: $y_i=\beta_0+\sum_{k=1}^p \beta_kx_{ik}+\epsilon_i$ where $\epsilon_i\sim N(0,\sigma^2)$, for $i = 1,2,\ldots ,n$. I want to prove that the residual sum of squares (SSR) of the model is $0$ when $p=n−1$.

Approach:

I have that SSR $= \sum_{i=1}^n e_i^2 = \sum_{i=1}^n[y_i-(\beta_0+\sum_{k=1}^p \beta_kx_{ik})]^2$. I expanded this many times to get a series of expressions but every time, it seems to be leading me away from showing that it is $0$ when $p=n-1$. Is expanding it not the right approach? Should I be trying to prove this using properties of the particular model?

Thanks!

1

There are 1 best solutions below

0
On

(since I can't add a comment I will repost a answer I posted here)

There is a simple proof that requires only linear algebra.

First notice that if you take $\beta=(\beta_0,\beta_1,\ldots,\beta_{n−1})^T$ and the matrix $X=[\mathbb 1,x_1,x_2,\ldots,x_{n−1}]$, where $\mathbb 1=(1,1,\ldots,1)^T$ and $x_i=(x_{i1},\ldots,x_{i(n−1)})^T$, you can write the model equation as:

$$y=X\beta+\epsilon$$

You know that the least square estimate $\hat y$ is the orthogonal projection of $y$ over the linear space spawned by the columns of $X$. But if $p=n−1$ and the columns of $X$ are linearly independent you are projecting $y$ over $\mathbb R$ since $X$ has $n$ independent columns.

When you orthogonally projects $y\in\mathbb R$ over $\mathbb R$ you get $y$ again. For this reason you have $\hat y=y$ which clearly shows that the $RSS=\|y−\hat y\|^2$ is equal to zero.