How to construct depth $k$ approximation to polynomial from depth$1$

36 Views Asked by At

Recently, I am reading paper "The power of deeper networks for expressing natural functions".

Consider the standard model of feedforward neural networks $N(x) = A_k\sigma(\cdots\sigma(A_1\sigma(A_0x))\cdots)$ where $A_0,A_1,\cdots,A_k$ are constant matrices and $\sigma$ denotes a scalar nonlinear function applied element-wise to vectors.

Let $m_k^{\epsilon}(p)$ be the minimum number of neurons in a depth-k network that $\epsilon$-approximate $p$ which means that for $x\in(-R,R)$, $\sup_x|N(x)-p(x)|\leq \epsilon$. The paper gives a theorem that tells $\lim_{\epsilon\to 0} m_k^\epsilon(p)$ exists.

The proof says it suffices to prove $\lim_{\epsilon\to 0}m_1^\epsilon(p)$ exists because an $\epsilon$-approximation to $p$ with depth $k$ can be constructed from one with depth $1$.

This is where I am stuck with. It seems natural that the above dedication is correct but I cannot come up with a rigorous proof.

Thanks for reading. Any help would be appreciated.

1

There are 1 best solutions below

0
On BEST ANSWER

For any $\sigma$ that is differentiable such that its derivative does not vanish in a point $x_0$ you have that $$ \left|\sigma(a x + x_0) - \sigma(x_0) - \sigma'(x_0) (a x)\right| = o(a) $$ for $a \to 0$ and all $|x| \leq R$. Therefore, $$ \frac{\sigma(a \cdot + x_0)-\sigma(x_0)}{\sigma'(x_0) a} \to \mathrm{Id}, $$ for $a \to 0$. Here $\mathrm{Id}$ denotes the identity map on $\mathbb{R}$.

Now observe that $\frac{\sigma(a \cdot + x_0)-\sigma(x_0)}{\sigma'(x_0) a}$ is a neural network with one neuron and one hidden layer. You can stack $k-1$ of these after another to get an approximation to the identity by a depth $k-1$ network. Combining this approximation of the identity with the depth $1$ approximation results from the paper gives an approximation result with depth $k$.