Combining multiple regression formulas

Question

Combining multiple regression formulas

170 Views Asked by Bumbble Comm At 11 May 2026 - 3:41

The normal equation for ordinary least squares regression is as follows:

$$ \hat{w} = (X^TX)^{-1}X^Ty $$

but this gives a model that underfits. One way to counter underfitting is to use locally weighted linear regression using a Gaussian kernel. The formula goes as follows:

$$ \hat{w} = (X^TWX)^{-1}X^TWy $$

where W is a diagonal matrix of weights. The closer an unknown point $x$ is to example $i$ in the training set, the higher the value for $W[i,i]$. So instead of getting a straight line, line can now be curvy depending on the size of the kernel.

and finally, there's ridge regression that shrinks the regression weights to avoid overfitting. The formula goes as:

$$ \hat{w} = (X^TX + \lambda I)^{-1}X^Ty $$

My question is: can equations 2 and 3 be combined to get the best of both the worlds? To have a line that isn't straight but also doesn't overfit the data??

$$ \hat{w} = (X^TWX + \lambda I)^{-1}X^TWy $$

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

The basic model behind the first equation is

$$ y =X\beta + \epsilon $$

with $\epsilon$ a sample out of multivariate normal $\mathcal N(0,\sigma^2 I)$. Assuming this model holds, then the best linear unbiased estimator (BLUE) of $\beta$ is the OLS:

$$ \hat\beta = (X^tX)^{-1}X^ty $$

Now, if you believe that your error is not homoscedastic so that for some reason you think that $\epsilon \sim \mathcal N(0,\Sigma)$ for some covariance matrix $\Sigma$ then what you can do is consider the Cholesky decomposition of $\Sigma$: $\Sigma=LL^t$ you can then write:

$$ y =X\beta + L \epsilon' $$

with $\epsilon'\sim \mathcal N(0,I)$. With $\Sigma$ positive definite, $L$ is invertible, and you can write

$$ L^{-1}y = L^{-1}X\beta + \epsilon' $$

then you're back in the first scenario, with the BLUE being $\hat\beta = (X^t\Omega X)X^t \Omega y$ with $\Omega=\Sigma^{-1}=(LL^t)^{-1}=L^{-t}L^{-1}$.

If you can approximate the inverse of the covariance matrix (the $\Omega$) for example with a diagonal matrix, you're in the second scenario you're referring to.

Now, it's not hard to link this with Ridge regression, with homoscedastic errors, you regularize with

$$ \min_\beta \|X\beta - y\|_2^2 + \lambda \|\beta\|_2^2 $$

and the answer is, as you wrote: $\hat\beta_R = (X^t X+\lambda I)^{-1}X^t y$.

Let's do exactly the same thing after transforming the data with the square root $L^{-1}$ of $\Omega=L^{-t}L^{-1}$ (or an approximation):

$$ \min_\beta \| L^{-1}X\beta - L^{-1} y\|_2^2 + \lambda\|\beta\|_2^2 $$

just rewrite $X'=\Omega X$ and $y'=\Omega y$, the problem is then the same than in the Ridge regression case and therefore, indeed, the solution is the "combination":

$$\hat\beta_R' = (X^t\Omega X+\lambda I)^{-1}X^t \Omega y. $$

Combining multiple regression formulas

There are 1 best solutions below

Related Questions in REGRESSION

Related Questions in MACHINE-LEARNING

Related Questions in REGULARIZATION

Trending Questions

Popular # Hahtags

Popular Questions