I read somewhere that the NR algorithm in general (given an appropriate initial value) increases in accuracy by roughly two decimal places per iteration.
Is this something that can be proven, or is it just a general rule of thumb that happens to work for a large number of functions?
Let $x_k$ be the $x$ at the $k$ iteration, and $\alpha$ be the root we are looking for.
After applying taylor's formula + Lagrange remainder:
$f(\alpha)=f(x_k)+(\alpha-x_k)f'(x_k)+\frac{1}{2}(\alpha-x_k)^2f''(\beta)$
Since $alpha$ is a root, $f(\alpha)$ is zero
$f(x_k)+(\alpha-x_k)f'(x_k)+\frac{1}{2}(\alpha-x_k)^2f''(\beta)=0$
We can divide by $f'(x_k)$ to simplify things a bit:
$$\frac{f(x_k)}{f'(x_k)}+(\alpha-x_k)+\frac{(\alpha-x_k)^2f''(\beta)}{2f'(x_k)}=0$$
Since $x_{k+1}=x_k-\frac{f(x_k)}{f'(x_k)}$, $\frac{f(x_k)}{f'(x_k)}=x_k-x_{k+1}$
So we will rewrite the previous term:
$$x_k-x_{k+1}+(\alpha-x_k)+\frac{(\alpha-x_k)^2f''(\beta)}{2f'(x_k)}=\alpha-x_{k+1}+\frac{(\alpha-x_k)^2f''(\beta)}{2f'(x_k)}=0$$
So...
$$x_{k+1}-\alpha=\frac{(\alpha-x_k)^2f''(\beta)}{2f'(x_k)}$$
Add absolute value to get error term:
$$|x_{k+1}-\alpha|=|\frac{(\alpha-x_k)^2f''(\beta)}{2f'(x_k)}|$$
$f''(\beta)$ is just a number, a scalar if you will, we assume that $f'(x_k)$ is not zero (if it is, then the process could fail and we aren't assured convergence). So this is a function of $(\alpha-x_k)^2$ which proves quadratic convergence.
This basically means that every iteration you are DOUBLING the number of precision digits. When it converges, it is very very good, the problem is, that unless certain restrictions apply, we can't be assured of convergence. Thankfully, those restrictions are not too strict.
Generally speaking, if you can find a neighborhood $[a,b]$ such that $\alpha \in [a,b]$, and also $f'(x)$ is not zero in $[a,b]$ and also $f''$ is bounded on $[a,b]$, any $x_0 \in [a,b]$ that you choose will converge (assuming $[a,b]$ is not too large).