Can we put bounds on the number of stationary points over parameter space for a vanilla MLP where loss is MSE?

22 Views Asked by At

Suppose $f(\vec{x}; \vec{\theta}, A)$ is a fully-connected vanilla (i.e. alternating matrix multiplications and sigmoid functions) multilayer perceptron that takes a vector $\vec x$ from the features space $\Omega$ and is parametrized by $\vec \theta \in \mathbb{\Theta}$. The matrix $A$ is the adjacency matrix for the neurons.

Since the number of stationary points is a counting number, we know the number of stationary points is greater-than-or-equal-to zero. But I kind of doubt that is the tightest lower bound for these models. Even a lower bound of one sounds more likely to me, but I do not have a proof for it yet.

Suppose we have a loss function $\mathcal{L}$ taken to be the means-squared error. The stationary points in $s \in \Theta$ satisfy $\nabla_{\vec \theta} \mathcal{L}(f(\vec{x}; s, A), \vec y) = 0$.

Can we put an upper and lower bound on the number of such stationary points assuming that $\vec X$ is not a degenerate random variable?


488 Solutions to the XOR Problem suggests that the number of stationary points could be quite large even for a modest-sized MLP.