Relationship Between Root Finding and Optimization?

555 Views Asked by At

I had a professor in my engineering faculty tell us that there is a very close relationship between root finding algorithms and optimization algorithms. In particular, this is supposedly seen in the Newton-Raphson method.

The Newton-Raphson method is traditionally considered as a root finding algorithm (i.e. find the zeros of a polynomial), yet at the same time - the Newton-Raphson algorithm is used to estimate the parameters of a logistic regression model, and as a result, indirectly serves as an optimization algorithm. In other words, one could supposedly argue that the Newton-Raphson algorithm is attempting to estimate the "optimal" set of regression coefficients relative to the choice of model, loss function and available data.

I decided to look further into this. It seems to me that the premise of the Newton-Raphson algorithm relies on the assumption that the second order Taylor Expansion of the Loss Function (e.g. for some statistical model) sufficiently approximates the original Loss Function in a local neighborhood. And by doing some basic algebraic manipulation, we can show that the second order Taylor Expansion is minimized by a function that contains the ratio of the second derivative of the (second order) Taylor Expansion and the the first derivative of the (second order) Taylor Expansion. Thus, the Newton-Raphson algorithm repeatedly approximates the Loss Function using a second order Taylor Expansion (around some point), finds the minimum point of this second order Taylor Expansion in a local neighborhood, and continues this process until some tolerance threshold is achieved. Like with all optimization methods, it should be notes that the Newton-Raphson algorithm provides no guarantee of finding the global minimum of a non-convex function.

  • But I still don't understand this professor's claim - how exactly does the Newton-Raphson algorithm demonstrate the very close relationship between optimization and root finding?

Thank you!

1

There are 1 best solutions below

0
On

For a function $f:\mathbb{R}\rightarrow \mathbb{R},f(x)\in C^{(2)}$ the relation is this:

  1. Finding a root for $f(x)$ using Newton-Raphson (NR): the recursion $$x_{k+1}=x_k-\frac{f(x_k)}{f'(x_k)}$$ produces a sequence of solutions for the root of first order (linear) Taylor series approximations for $f(x)$, that is the value $x_{k+1}$ that solves $f(x_k)+f'(x_k)(x_{k+1}-x_k)=0$, given $x_k$, starting on an initial value $x_0$. The method is not guaranteed to converge to the actual root, but when converges, converges fast.

  2. Finding an optimizer for $f(x)$ using NR: the recursion $$x_{k+1}=x_k-\frac{f'(x_k)}{f''(x_k)}$$ starting on an initial value $x_0$, represent the sequence of solutions for optimizers of the 2nd order (polynomial) Taylor series approximations $g(x_{k+1})=f(x_k)+f'(x_k)(x_{k+1}-x_k)+\frac{1}{2} f''(x_k)(x_{k+1}-x_k)^2$, given $x_k$. That is the same as finding the root $x_{k+1}$ for the derivative of this approximation using first order conditions: $f'(x_k)+f''(x_k)(x_{k+1}-x_k)=0$. The method is not guaranteed to converge to the actual optimizer, but when converges, converges fast.

In optimization problems you might think on using NR for root finding in the resolution of the first order conditions (gradient equal zero) but it is not a recommended strategy. See, for instance, books like Luenberger and Ye, Linear and Non-Linear Programming for details. Directly using NR for optimization based on the second order Taylor approximation is often thought to be the better approach (considering only these 2 approaches) in these cases.