Derive closed form solution of linear regression in the overparametrized case

Question

Derive closed form solution of linear regression in the overparametrized case

382 Views Asked by Bumbble Comm At 31 Mar 2026 - 2:24

Consider the linear regression model in the case where number of training samples $n$ < number of parameters $d$ (overparametrized linear regression). The model is

$$y_i = <\beta,\mathbf{x}_i> +\epsilon_i, \quad \beta\in\mathbb{R}^d,\ \epsilon_i\in\mathbb{R},\ i=1,\dots,n$$

where $\epsilon$ is zero-mean noise. Let $\mathbf{X}\in\mathbb{R}^{n\times d}$ be the data matrix:

\begin{align} \mathbf{X} &= \begin{bmatrix} \mathbf{x}_1\\ \mathbf{x}_2\\ \vdots \\ \mathbf{x}_n \end{bmatrix} \end{align}

Now, since $n<d$, $\mathbf{X}^T\mathbf{X}\in\mathbb{R}^{d\times d}$ is not invertible, since $$\text{rank}(\mathbf{X})=\text{rank}(\mathbf{X}^T) \leq \min \{n,d\}=n\implies \text{rank}(\mathbf{X}^T\mathbf{X})\leq n<d$$

There are clearly infinitely many $\beta$ such that

$$\mathbf{y}=\mathbf{X}\beta$$

I'm told that the minimum norm solution is

$$\beta^*=\mathbf{X}^{\dagger}y$$

where, in this specific case $n < d$, $\mathbf{X}^{\dagger}=\mathbf{X}^T(\mathbf{X} \mathbf{X}^T)^{-1}$. Can you help me prove this in a constructive manner? I tried to premultiply by $\mathbf{X}^T\in\mathbb{R}^{d\times n}$ (which is legit since $\mathbf{y}\in\mathbb{R}^n$), but I get

$$\mathbf{X}^T\mathbf{y}=\mathbf{X}^T\mathbf{X}\beta$$

and now I'm stuck because, as noted above, $=\mathbf{X}^T\mathbf{X}$ is not invertible.

EDIT: the solution

$$\beta^*=\mathbf{X}^{\dagger}y$$

is valid iif $\mathbf{X}\mathbf{X}^T$ is invertible. However, since $\mathbf{X}\mathbf{X}^T\in\mathbb{R}^{n\times n}$, $\text{rank}(\mathbf{X}^T\mathbf{X})\leq n $ and $\mathbf{X}$ is a random matrix, meaning that its rows are random vectors sampled from an (unknown) probability distribution, for most nondegenerate probability distributions $\mathbf{X}\mathbf{X}^T$ will be full rank.

EDIT2: a question was mentioned in the comments, which isn't related to my question. I explicitly asked for a constructive proof of the fact that, when $n<d$, the minimum norm solution is

$$\beta^*=\mathbf{X}^T(\mathbf{X} \mathbf{X}^T)^{-1}y$$

instead of the usual one, when $\mathbf{X}$ is full column rank

$$\beta^*=(\mathbf{X}^T \mathbf{X})^{-1}\mathbf{X}^Ty$$

Thus, answers that don't constructively derive this expression, are not answers to this question.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-01-31 21:48:48

(Note to whoever was thinking about answering my question: I'm not going to accept my own answer anytime soon, so if you were going to write an answer of yours, don't feel discouraged!)

Proof #1

By Lagrange multipliers: we're looking for

$$\min_{\beta}{\frac{1}{2}||\beta||^2} \quad s.t. \mathbf{X}\beta=\mathbf{y}$$

Convert to unconstrained optimization by introducing a Lagrange multiplier $\lambda$:

$$\min_{\beta,\lambda}\frac{1}{2}||\beta||^2+\lambda^T(\mathbf{X}\beta-\mathbf{y})=\min_{\beta,\lambda}\mathcal{L}(\beta,\lambda)$$

Set gradients to 0:

$$ \begin{equation*} \begin{alignedat}{3} \nabla_{\beta}{\mathcal{L}} & = \beta +\mathbf{X}^T\lambda=0 \\ \nabla_{\lambda}{\mathcal{L}} & = \mathbf{X}\beta-\mathbf{y}=0 \end{alignedat} \end{equation*}$$

Thus $-\mathbf{X}\mathbf{X}^T\lambda=\mathbf{y}$. But since $\mathbf{X}\mathbf{X}^T$ is invertible, we get $$\lambda=-(\mathbf{X}\mathbf{X}^T)^{-1}\mathbf{y}\implies\beta=-\mathbf{X}^T\lambda=\mathbf{X}^T(\mathbf{X}\mathbf{X}^T)^{-1}\mathbf{y}\ \square.$$

The nice thing about this proof is that you don't have to commit the expression of the pseudo-inverse to memory.

Proof #2

Not as satisfying. Assume $\beta^*=\mathbf{X}^T(\mathbf{X}\mathbf{X}^T)^{-1}\mathbf{y}$: we want to prove that

it's a solution of $\mathbf{X}\beta=\mathbf{y}$
it's the min norm solution

1) is trivial: $\mathbf{X}\beta^*=\mathbf{X}\mathbf{X}^T(\mathbf{X}\mathbf{X}^T)^{-1}\mathbf{y}=\mathbf{y}$. To prove 2), it's sufficient to prove that, given any other solution $\beta$ to $\mathbf{X}\beta=\mathbf{y}$, then $(\beta-\beta^*)=\mathbf{u}$ is orthogonal to $\beta^*$. The assertion then follows immediately: $$||\beta||^2=\beta^T\beta=(\mathbf{u}+\beta^*)^T(\mathbf{u}+\beta^*)=||\mathbf{u}||^2+2\mathbf{u}^T\beta^*+||\beta^*||^2=||\mathbf{u}||^2+||\beta^*||^2\geq||\beta^*||^2$$

To prove that $(\beta^*)^T\mathbf{u}=0$, we write

$$(\mathbf{X}^T(\mathbf{X}\mathbf{X}^T)^{-1}\mathbf{y})^T\mathbf{u}=\mathbf{y}^T(\mathbf{X}\mathbf{X}^T)^{-1}\mathbf{X}\mathbf{u}=\mathbf{y}^T(\mathbf{X}\mathbf{X}^T)^{-1}\mathbf{X}(\beta-\beta^*)=\mathbf{y}^T(\mathbf{X}\mathbf{X}^T)^{-1}(\mathbf{y}-\mathbf{y})=0$$

where we used the fact that $\mathbf{X}\mathbf{X}^T$ is symmetric and so is its inverse.

Derive closed form solution of linear regression in the overparametrized case

There are 1 best solutions below

Proof #1

Proof #2

Related Questions in LINEAR-ALGEBRA

Related Questions in STATISTICS

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions