Suppose we have a homogeneous linear model $y = w^T x$ and define the following error function to minimize:
\begin{align} E(w) = \alpha||w||^2 + \frac{1}{N}||X^Tw - t||^2 \end{align}
This is just the standard squared loss with a regularization term.
In our lecture, we have said that we want the Hessian of our error function to have a low condition number in order to get to a solution quickly using gradient descent. In this case, it can be shown that we can force the Hessian to have identity structure by replacing $\alpha ||w||^2$ by $w^T(I - \Sigma)w$, so that the condition number is one. The error would become:
\begin{align} E(w) &= ||w||^2 - w^T\Sigma w+ \frac{1}{N}||X^Tw - t||^2 \\ &= ||w||^2 - ||X^Tw||^2+ \frac{1}{N}||X^Tw - t||^2 \end{align}
What would be the soundness of this approach? Would the resulting model be more likely to overfitting because we can't set a high alpha value anymore? The error function can be minimized by having a large $||X^Tw||^2$, so our $w$ might become very large when the data values are very high (?)