Eigenvalues are the principal axis for m-dimensional paraboloid?

205 Views Asked by At

From "Neural Networks - A Systematic Introduction" by Raul Rojas chapter 8 (p. 188):

$$E = ‖Xw−y‖^2= (Xw−y)^T(Xw−y)=w^T(X^TX)w−2y^TXw+y^Ty$$

Since this is a quadratic function, the minimum can be found using gradient descent.

The quadratic function E can be thought of as a paraboloid in m-dimensional space. The lengths of its principal axes are determined by the magnitude of the eigenvalues of the correlation matrix $X^TX$. Gradient descent is most effective when the principal axes of the quadratic form are all of the same length. In this case the gradient vector points directly towards the minimum of the error function. When the axes of the paraboloid are of very different sizes, the gradient direction can lead to oscillations in the iteration process as shown in Figure 8.1.

How does the author reach this ("The lengths of its principal axes are determined by the magnitude of the eigenvalues of the correlation matrix $X^TX$.") conclusion? Where can I read more about how this result is reached?