I'm a novice when it comes to the world of machine learning, but when attempting to understand the universal approximation theorem I've acquired the notion that a neural network can represent any function by approximating it with a weighted sum of sigmoids (I got this notion from this article). This sounds a lot to me like the expression of any function using orthogonal basis functions, like Legendre polynomials, Hermite polynomials, Chebyshev polynomials, and Fouirer series/transforms, able to approximate any function with a weighted series of orthogonal functions. Where's the link here? I don't suppose sigmoids are orthogonal functions, but they seem to be able to accomplish the same thing.
Come to think of it, why don't we use orthogonal functions to handle this goal of approximating any continuous, compact function that seems to be so crucial in ML?