How to formulate ordinary least squares regression in component formalism?

204 Views Asked by At

Matrix formulation is straightforward:

$\mathbf{y} = \mathbf{X} \boldsymbol{\hat{\beta}} +\boldsymbol{\hat{\varepsilon}}$

cost function: $E = {\boldsymbol{\hat{\varepsilon}}}^T{\boldsymbol{\hat{\varepsilon}}} = {(\mathbf{y} - \mathbf{X}\boldsymbol{\hat{\beta}})}^T(\mathbf{y} - \mathbf{X}\boldsymbol{\hat{\beta}})$

... differentiating wrt $\boldsymbol{\hat{\beta}}$ and searching for extremum:

$\frac{\partial E}{\partial \boldsymbol{\hat{\beta}}} = 2 \mathbf{X}^T\mathbf{X} \boldsymbol{\hat{\beta}} - 2 \mathbf{X}^T \mathbf{y} = 0$

thus the OLS estimate of $\boldsymbol{\hat{\beta}}$ is: $\boldsymbol{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$

So, there is probably some limitation to the previous relation (e.g. $(\mathbf{X}^T\mathbf{X})^{-1}$ have to exist) ... am I right?

If I try to make the same with component notation, there is a problem in the same formula (I'll come back later to this).

In component formalism (using Einstein's summation convention):

$E = (X_{ij} \beta_j - y_i)^2 = (X_{ij}\beta_j)^2 - 2 X_{ij}\beta_j y_i + y_i^2$

$\frac{\partial E}{\partial \beta_j} = 2X_{ij} \beta_j X_{ij} - 2X_{ij} y_i = 0$

$X_{ij} \beta_j X_{ij} = X_{ij} y_i$

Now, every term is just scalar, so it's tempting to cancel $X_{ij}$ on both sides. However, this just leads to trivial relation: $y_i = X_{ij} \beta_j$

Can someone help me to enlighten this, please? Isn't it somehow connected to the use of only lower indices? When I have to consider both lower and upper indices (tensors and duals)?

Thank you!

2

There are 2 best solutions below

1
On BEST ANSWER

The error in your component derivation: When you differentiate wrt $\beta_j$, the index $j$ now has two roles: one as the generic index in the sum, and one as the index specifying which $\beta$ you are differentiating with respect to. The partial derivative wrt $\beta_j$ should treat the other $\beta$'s as constant, but your notation can no longer distinguish them!

Better to use a new index, say $k$, to perform the differentiation. When you do this, you find that the partial derivative will be $$\frac{\partial E}{\partial\beta_k}=2X_{ij}\beta_jX_{ik} - 2X_{ik}y_i.\tag1$$ There's still summing going on (involving $i$ and $j$, with $k$ held constant) when you set (1) to zero, so it doesn't make sense to factor out $X_{ij}$. Convert back into matrix notation and you'll get $$X^TX\beta=X^Ty.\tag2$$ Specifically, $\sum_i\sum_jX_{ij}\beta_jX_{ik}$ is the $k$th member of the vector $X^TX\beta$, while $\sum_iX_{ik}y_i$ is the $k$th member of $X^Ty$.

3
On

What you have is essentially $\boldsymbol{X}^T\boldsymbol{X}\boldsymbol{\beta} = \boldsymbol{X}^T\boldsymbol{y}$, regardless of which notation you use. As you pointed out, if you cancel $\boldsymbol{X}^T$ from both sides, you are left with $\boldsymbol{X}\boldsymbol{\beta}=\boldsymbol{y}$. However, the point of doing least squares in the first place is that $\boldsymbol{X}\boldsymbol{\beta}=\boldsymbol{y}$ cannot be solved for $\boldsymbol{\beta}$, i.e., $\boldsymbol{X}$ is not invertible. By multiplying each side by $\boldsymbol{X^T}$, you get $\boldsymbol{X}^T\boldsymbol{X}$ on the left, which is invertible.