Training error in linear regression can always be made zero, given certain condition?

Question

Training error in linear regression can always be made zero, given certain condition?

246 Views Asked by Bumbble Comm At 06 Apr 2026 - 9:18

I was going through the derivation of linear regression and wanted to verify that what I understood is indeed correct?

We minimize this $$ {\lVert y -Xw \rVert}^2 $$

where y is a $n*1$ dimensional vector , X is our data which is $n * d$, and w is a $d*1$ dimensional vector.

Note: Both y and X belong to the training set, as in we are learning $ w_{ls} = {(X^TX)}^{-1} X^Ty $

Now we can reach every possible y, by some linear combination of columns of X, if there are at least n linearly independent columns in $X$.

So we would always be able to find a weight vector w, that makes the total error zero when the above condition hold (is this a right deduction?)

In Matrix Representation, consider we have 4 data points

$ \implies \begin{bmatrix} y1 \\ y2 \\ y3 \\ y4 \\ \end{bmatrix} = \begin{bmatrix} x11 & x12 & x13 & x14 \\ x21 & x22 & x23 & x24 \\ x31 & x32 & x33 & x34 \\ x41 & x42 & x43 & x44 \\ \end{bmatrix} \begin{bmatrix} w1 \\ w2 \\ w3 \\ w4 \\ \end{bmatrix} $

we see that Y is a vector in $R^n$, as X has $n$ linearly independent columns, we can represent every point in $R^n$ as a combination of the columns of X For eg.

$ \implies \begin{bmatrix} y1 \\ y2 \\ y3 \\ y4 \\ \end{bmatrix} = \begin{bmatrix} x11 \\ x21 \\ x31 \\ x41 \\ \end{bmatrix} \begin{bmatrix} w1 \\ \end{bmatrix} + \begin{bmatrix} x12 \\ x22 \\ x32 \\ x42 \\ \end{bmatrix} \begin{bmatrix} w2 \\ \end{bmatrix} + \begin{bmatrix} x13 \\ x23 \\ x33 \\ x43 \\ \end{bmatrix} \begin{bmatrix} w3 \\ \end{bmatrix} + \begin{bmatrix} x14 \\ x24 \\ x34 \\ x44 \\ \end{bmatrix} \begin{bmatrix} w4 \\ \end{bmatrix} $

Now our training labels with n datapoints points can be represented as a single vector in $R^n$ dimensional space, And any values of ${[y1,y2,y3,y4]}^T$ represent a point in this space.

Since all points are reachable by a linear combination of columns, any values of ${[y1,y2,y3,y4]}^T$ must also be reachable and hence error should be zero.

This essentially boils down to, that we can always make the training error zero for a particular ${[y1,y2,y3,y4]}^T$ value, by selecting the right weights.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-10-15 22:24:57

Now we can reach every possible y, by some linear combination of columns of X, if there are at least n linearly independent columns in . So we would always be able to find a weight vector w, that makes the total error zero when the above condition hold (is this a right deduction?)

This is only true if we know that the entries of $y$ will exactly be a linear function of the rows of $X$ (consider the case where there is some noise, in which case you wont be able to exactly drive the error to $0$).

Now when X does not have n linearly independent columns, we can only reach some lower dimension then the dimensions in which y exists , and there would be a non zero error term? (is this true?)

I'm not sure what you mean by "reach some lower dimension".

You need to specify some assumptions on how $y, X$ are related. For example, assume that:

$$y_i = x_i^Tw + \epsilon$$

where $\epsilon \sim N(0, 1)$ is additive gaussian noise. In this case, given $n>d$ samples, we can minimize the squared error by using the formula for $w$ that you've given above.

In contrast to solving a consistent system of equations such as $\alpha = A\beta$ for $beta$ where $A$ is an $n\times n$ matrix, usually when solving for a least square solution, our $X$ is not full-rank (not invertible), so we cannot simply compute $w = X^{-1}y$. Instead, the formula you've given computes $w$ such that $Aw$ is the orthogonal projection of $y$ onto the range of $X$. This what you want since the orthogonal projection onto the range of $X$ is exactly: $\min \|y-u\|_2^2$ subject to $u = Xw$.

Training error in linear regression can always be made zero, given certain condition?

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MACHINE-LEARNING

Related Questions in REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions