This is a question that I have always been interested in:
Suppose we have a system of equations - for example, a system of non-linear equations. From what I have seen, in most cases, we tend to use numerical optimization methods to attempt and find an approximate solution. The idea being, we simply assume that an analytical solution to this system of non-linear equations probably does not exist, and some numerical approximation technique (e.g. Stochastic Gradient Descent) will likely be able to provide a suitable answer in an acceptable amount of time.
But this brings me to my question. Consider the case of Neural Networks (Machine Learning) - in Neural Networks, we are required to find an "optimal" set of weights that minimizes the Loss Function of the Neural Network. Doing this is effectively attempting to "solve" a system of non-linear equations.
It has almost become routine to "solve" these systems of non-linear equations using numerical optimization methods such as "Stochastic Gradient Descent", as we implicitly assume that there probably does not exist an analytical solution to this system of non-linear equations. Although in theory there "might" exist an analytical solution to this system of non-linear equations "perhaps if we looked hard enough" - we just "assume" it probably doesn't exist or if it did exist it would take too much work to uncover, and thus we "solve" this system of equations numerically and "hope" that the numerical solution is "good enough".
But can we actually prove this: Can we mathematically demonstrate that some specific system of non-linear equations does not have an analytical solution?
The closest mathematical theorem I could find that addresses this point is the Abel-Ruffini Theorem (https://en.wikipedia.org/wiki/Abel%E2%80%93Ruffini_theorem) - but I am not sure if this theorem is specifically making claims about the question I am asking.
- Can someone please comment on this : For optimization problems in which we rely on numerical approximations - is it possible that some of these problems might actually have analytical solutions?
Thanks!
Let $f:A\to B$ be a mapping, then an “equation” can be written as $f(a)-b=0$ with $a \in A$ and $b\in B.$ From here, you can wind up in almost every branch of mathematics.
With neural networks, $A$ is usually the set of C-infinity functions whose domain is a vector space (for example, the set of all possible 1080p images) and range is some kind of decision space (for example, a binary set of images that contain a hot dog or not hot dog). $B$ would be the positive reals, representing the error between a “solution” (the neural network and it’s learned weights) and the true function that you are trying to model. Note that we can solve this equation numerically with any number of models; neural networks are just a popular choice. Check out Rasmussen’s book GPML for an alternative. Sometimes, good old linear regression works best.
Some people don’t even try to find an analytical solutions to whatever it is they’re throwing neural networks at because the tools are becoming so easy to use. Some are successful with this approach! There are libraries out there that even automatically identify which parts of the input space ($A$ from my example above) are important and how they interact most significantly that a physicist might use to analytically find the function that the neural network is mimicking.
There are many kinds of equations for which an analytical solution exists. The implicit function theorem (and the many cool extensions) comes to mind as particularly fertile ground to guarantee the existence of analytical solutions to systems of equations.
There are classes of systems of equations for which we know no analytical solution exists.
You can analytically solve optimization problems to get solutions to systems of equations, by the way, instead of chucking them at neural networks. In my notation above, we usually try to find some function $g$ such that $|f-g|$ is minimized using gradients and Hessians and stuff like that. Check out Ruszczynski’s book “Nonlinear Optimization.”