What are the limitations of linear regression + feature / label transformation?

Question

What are the limitations of linear regression + feature / label transformation?

749 Views Asked by Bumbble Comm At 29 Mar 2026 - 4:11

Regression

Suppose I have data points in a matrix $X \in \mathbb{R}^{n \times m}$ as well as labels $\mathbb{R}^n$, where $n$ is the number of my data points and $m$ is the number of features per data point. For a new data point $x \in \mathbb{R}^m$ I want to predict a value $\hat{y} \in \mathbb{R}$.

Linear Regression

A simple way to do so is to assume that the data is created by a linear function:

$$y = x^T \cdot w$$

where $w \in \mathbb{R}^m$ are parameters which have to be learned from the data we've collected so far.

A simple way to learn the parameters $w$ is

$$w = (X^T X)^{-1} X^T y$$

Quadratic transformation sanity check

Now it is possible to add some features to the data points. For example, say we have $x \in \mathbb{R}$ and we transform the feature by $$\Phi(x) = (x, x^2)$$

Let

$$X = \begin{pmatrix}-1 \\ 0\\ 1\end{pmatrix}\;\;\; y = \begin{pmatrix}1\\0\\1\end{pmatrix}$$

and thus

$$\Phi(X) = \begin{pmatrix}-1 & 1 \\ 0 & 0\\ 1 & 1\end{pmatrix}$$

Now we can get $w$ by

$$ \begin{align} w &= (\Phi(X)^T \Phi(X))^{-1} \Phi(X)^T y\\ &= \begin{pmatrix}2 & 0\\ 0 & 2\end{pmatrix}^{-1} \begin{pmatrix}-1 & 1 \\ 0 & 0\\ 1 & 1\end{pmatrix}^T \begin{pmatrix}1\\0\\1\end{pmatrix}\\ &= \frac{1}{2} \cdot \begin{pmatrix}-1 & 0 & 1\\1 & 0 & 1\end{pmatrix} \begin{pmatrix}1\\0\\1\end{pmatrix}\\ &= \begin{pmatrix}0\\1\end{pmatrix} \end{align}$$

Hence the found model is

$$\hat{y} = x^2$$

which is exactly what I had in mind when I tried this example.

Transforming the labels

My first thought about the limitations of this method was that a model like $y = e^{w_1 x}$ could not be fitted. However, if we add a bijective label transformation $\Psi(y) = \log(y)$ we have the problem $\Psi(y) = w_1 x$ which, I guess, can again be solved by a linear regression model.

Question

My question is if it always works like this. So, lets say the data is generated by a polynomial of degree 1337. Could I simply make a feature transforming function $\phi(x) = (1, x, x^2, \dots, x^{1337})$ and expect to get the generating polynomial if I have enough (1338?) points?

I am pretty sure the answer is "yes" in this case, because the prediction is only a linear combination of the transformed features.

However, what about a model $y = w_1\,(1-2e^{w_2 x})$? Is it possible to find a $\Psi, \Phi$ so that one can use the linear regression again?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Could I simply make a feature transforming function $\phi(x) = (1, x, x^2, \dots, x^{1337})$ and expect to get the generating polynomial if I have enough (1338?) points?

Yes, any polynomial of degree $n$ can be written as a linear combination of $\{x^0, x^1, ... , x^n \}$. Linear regression can learn any linear combination of its features, hence any polynomial can be learned using the features you have described. And yes, you will need at least $n+1$ points to fit a polynomial of degree $n$.

However, what about a model $y = w_1\,(1-2e^{w_2 x})$? Is it possible to find a $\Psi, \Phi$ so that one can use the linear regression again?

I strongly think that this is not possible. Do not have a formal proof yet, will edit the answer as soon as I find a neat reason ;).

What are the limitations of linear regression + feature / label transformation?

Regression

Linear Regression

Quadratic transformation sanity check

Transforming the labels

Question

There are 1 best solutions below

Related Questions in REGRESSION

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions