How to formulate ordinary least squares regression in component formalism?

Question

How to formulate ordinary least squares regression in component formalism?

204 Views Asked by Bumbble Comm At 06 Apr 2026 - 12:37

Matrix formulation is straightforward:

$\mathbf{y} = \mathbf{X} \boldsymbol{\hat{\beta}} +\boldsymbol{\hat{\varepsilon}}$

cost function: $E = {\boldsymbol{\hat{\varepsilon}}}^T{\boldsymbol{\hat{\varepsilon}}} = {(\mathbf{y} - \mathbf{X}\boldsymbol{\hat{\beta}})}^T(\mathbf{y} - \mathbf{X}\boldsymbol{\hat{\beta}})$

... differentiating wrt $\boldsymbol{\hat{\beta}}$ and searching for extremum:

$\frac{\partial E}{\partial \boldsymbol{\hat{\beta}}} = 2 \mathbf{X}^T\mathbf{X} \boldsymbol{\hat{\beta}} - 2 \mathbf{X}^T \mathbf{y} = 0$

thus the OLS estimate of $\boldsymbol{\hat{\beta}}$ is: $\boldsymbol{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$

So, there is probably some limitation to the previous relation (e.g. $(\mathbf{X}^T\mathbf{X})^{-1}$ have to exist) ... am I right?

If I try to make the same with component notation, there is a problem in the same formula (I'll come back later to this).

In component formalism (using Einstein's summation convention):

$E = (X_{ij} \beta_j - y_i)^2 = (X_{ij}\beta_j)^2 - 2 X_{ij}\beta_j y_i + y_i^2$

$\frac{\partial E}{\partial \beta_j} = 2X_{ij} \beta_j X_{ij} - 2X_{ij} y_i = 0$

$X_{ij} \beta_j X_{ij} = X_{ij} y_i$

Now, every term is just scalar, so it's tempting to cancel $X_{ij}$ on both sides. However, this just leads to trivial relation: $y_i = X_{ij} \beta_j$

Can someone help me to enlighten this, please? Isn't it somehow connected to the use of only lower indices? When I have to consider both lower and upper indices (tensors and duals)?

Thank you!

Original Q&A

There are 2 best solutions below

Bumbble Comm On 11 Nov 2020 - 7:23

What you have is essentially $\boldsymbol{X}^T\boldsymbol{X}\boldsymbol{\beta} = \boldsymbol{X}^T\boldsymbol{y}$, regardless of which notation you use. As you pointed out, if you cancel $\boldsymbol{X}^T$ from both sides, you are left with $\boldsymbol{X}\boldsymbol{\beta}=\boldsymbol{y}$. However, the point of doing least squares in the first place is that $\boldsymbol{X}\boldsymbol{\beta}=\boldsymbol{y}$ cannot be solved for $\boldsymbol{\beta}$, i.e., $\boldsymbol{X}$ is not invertible. By multiplying each side by $\boldsymbol{X^T}$, you get $\boldsymbol{X}^T\boldsymbol{X}$ on the left, which is invertible.

**Bumbble Comm** · Accepted Answer

The error in your component derivation: When you differentiate wrt $\beta_j$, the index $j$ now has two roles: one as the generic index in the sum, and one as the index specifying which $\beta$ you are differentiating with respect to. The partial derivative wrt $\beta_j$ should treat the other $\beta$'s as constant, but your notation can no longer distinguish them!

Better to use a new index, say $k$, to perform the differentiation. When you do this, you find that the partial derivative will be $$\frac{\partial E}{\partial\beta_k}=2X_{ij}\beta_jX_{ik} - 2X_{ik}y_i.\tag1$$ There's still summing going on (involving $i$ and $j$, with $k$ held constant) when you set (1) to zero, so it doesn't make sense to factor out $X_{ij}$. Convert back into matrix notation and you'll get $$X^TX\beta=X^Ty.\tag2$$ Specifically, $\sum_i\sum_jX_{ij}\beta_jX_{ik}$ is the $k$th member of the vector $X^TX\beta$, while $\sum_iX_{ik}y_i$ is the $k$th member of $X^Ty$.

How to formulate ordinary least squares regression in component formalism?

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in TENSORS

Related Questions in REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions