Soundness of changing the error function of a squared loss to force a good Hessian condition number

21 Views Asked by Bumbble Comm At 27 Mar 2026 - 9:51

Suppose we have a homogeneous linear model $y = w^T x$ and define the following error function to minimize:

\begin{align} E(w) = \alpha||w||^2 + \frac{1}{N}||X^Tw - t||^2 \end{align}

This is just the standard squared loss with a regularization term.

In our lecture, we have said that we want the Hessian of our error function to have a low condition number in order to get to a solution quickly using gradient descent. In this case, it can be shown that we can force the Hessian to have identity structure by replacing $\alpha ||w||^2$ by $w^T(I - \Sigma)w$, so that the condition number is one. The error would become:

\begin{align} E(w) &= ||w||^2 - w^T\Sigma w+ \frac{1}{N}||X^Tw - t||^2 \\ &= ||w||^2 - ||X^Tw||^2+ \frac{1}{N}||X^Tw - t||^2 \end{align}

What would be the soundness of this approach? Would the resulting model be more likely to overfitting because we can't set a high alpha value anymore? The error function can be minimized by having a large $||X^Tw||^2$, so our $w$ might become very large when the data values are very high (?)

Original Q&A

Soundness of changing the error function of a squared loss to force a good Hessian condition number

Related Questions in OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in ERROR-FUNCTION

Trending Questions

Popular # Hahtags

Popular Questions