I was reading the following link on Gradient Descent algorithms for Machine Learning (https://www.osti.gov/servlets/purl/983240/), when I came across the "Condition Number of a Matrix" for the first time in my life:
Based on the above description, it seems that the "condition number of a matrix" is related to the eigen values of the matrix of interest. In particular, if you have a n-dimensional matrix (called "Q"), it seems like the "condition number of this matrix" would be the ratio of the nth-eigenvalue divided by the first eigenvalue of this matrix.
I was interested in knowing why this "condition number" of a matrix is important, and how exactly it is related to the eigen values of the matrix.
When reading the Wikipedia article about this topic (https://en.wikipedia.org/wiki/Condition_number), it seems that you can have a "Condition Number" for a function as well as a matrix.
In terms of a matrix, the "Condition Number" is said to describe "how inaccurate the (numerically approximated) solution vector "x" will be for the system of linear equations "Ax = b" ".Larger "Condition Number" for "x" suggest that the numerically approximated solution will be worse compared to lower "Condition Numbers". Depending on the properties of the matrix, the formal definition for the "Condition Number" is as follows:
The above definitions for "Condition Numbers" seem quite clear - the "Condition Number" of a matrix A is the ration between the absolute values of the largest and smallest eigenvalues of A.
This leads me to my question:
- What is the significance of dividing the (absolute values) of the largest and smallest eigen values of a Matrix? How exactly does this ratio of eigenvalues relate to the error for the (numerically approximated) solution to a linear system of equations?
I can loosely understand that the "loss function" that is being "approximately" optimized by some Machine Learning model - the "quality" of the solution that is finally returned during the optimization process must have an approximation error, and it is logical to assume that this approximation error must some "bounds" - but how are the bounds on the approximation error related to the ratio of the smallest/largest eigenvalues corresponding to the (matrix of the) optimization function?
Can someone please comment on this?
Thanks!

