Friedberg linear algebra p.470
"Consider a case that there is an error $\delta A$ in the system $Ax=b$ (assume $A$ is invertible). It should be mentioned that, in practice, one never computes cond($A$) from its definition, for it would be an unneccesary waste of time to compute $A^{-1}$ merely to determine its norm. In fact, if a computer is used to find $A^{-1}$, the computed inverse of $A$ in all likelihood only approximates $A^{-1}$, and the error in the computed inverse is affected by the size of cond($A$). So we are caught in a vicious circle!"
Since there is an error in $A$, the equation would be $(A+\delta A)(x+\delta x)=b$.
First of all, say computer is used to find $A^{-1}$. Then, i think, the computed matrix is $A^{-1}$ itself, NOT approximation of $A^{-1}$. Why is it an approximation?
Second, why is the computed inverse affected by con($A$)?
To be honest, i have no idea why the argument described in that sentence is a cycle logic..
- Note: cond$(A) \triangleq ||A||||A^{-1}||$ where $|| \bullet ||$ is an operator norm.
In layman's terms, a condition number tells us how "nice" the matrix is when it comes to computing its inverse. The bigger the condition number, the worse it is.
For starters, what would be inverse of the number $3$? It's, of course, $\frac{1}{3}$, but computed on the computer, you'll get $0.33\dots3$, which is not a correct answer, but an approximation it.
Now, imagine what happens with matrices, where you don't have just one number, but $n^2$ numbers ($n$ being the order of the matrix), and the inverse is done using all of them together (i.e., not each element by itself).
For an example of the ill-conditioned matrices (meaning those with a big condition number), read about the Hilbert matrix, which is a typical example of such matrix, often used for numerical testings. The matrix from that page has one very small eigenvalue (see at Wolfram|Alpha). So, a minor error here, and this eigenvalue drops to zero and the matrix becomes singular (i.e., no inverse). And this is just for the order $5$. Fiddle with the order a bit (replace $5$ by something a bit larger) and see how fast they drop.
A bit of a side note: small eigenvalues are not the only thing that can cause trouble. This was just an example.
For a more down-to-Earth example, consider the following system of equations: $$x_1 + x_2 = 2, \quad x_1 + 1.0001x_2 = 2.0001.$$ You can easily solve it and find that $x_1 = x_2 = 1$. Now, consider this one, only slightly changed: $$x_1 + x_2 = 2, \quad x_1 + 0.9999x_2 = 2.0001.$$ The solution to this one is $x_1 = 3$, $x_2 = -1$. So, a minor change (of the magnitude $10^{-4}$, and in only one argument) has caused a huge change in the result. That system is ill-conditioned.
(This is a well known example; read more on it here)
Relation of the condition number and the inverse
For example, if $A = U \Sigma V^*$, where $\Sigma = \mathop{\rm diag}(x,y)$ for some $x > y \ge 1$, then your condition number, if computed via 2-norm (these norms are equivalent, so we have a certain freedom in choosing them), is $$\mathop{\rm cond}(A) = \|A\|\|A^{-1}\| = \frac{x}{y},$$ because $x$ is the largest singular value in $A$ and $1/y$ is the largest singular value in $A^{-1} = V \Sigma^{-1} U^*$, where $\Sigma = \mathop{\rm diag}(1/x,1/y)$.
But, when multiplying $U$, $\Sigma$ and $V$, you are using these numbers together. For simplicity sake, let's say that $x \approx 100$ and $y \approx 1$. This means that $x+y$ will lose $2$ significant digits of $y$. So, when computing $A^{-1} = V \Sigma^{-1} U^*$, you may lose that many significant digits, so your inverse may be that much less precise.
This is what the condition number tells you: "that much of a disaster may occur". Your solution may still be O.K., even if the condition number is big, but you'll have no guarantee of that (at least not by checking the condition number).
It is also worth noting that computing a condition number by the above formula is never done, as computing the inverse is far to expensive and the result may be very unreliable. Instead, condition numbers are estimated in various ways.
Keep in mind that the above is not a strict, formal proof. It is a layman description of the motivation. The whole theory is much, much more wide, but I hope this should give some insight.