Universal approximation theorem that includes approximating Jacobians

55 Views Asked by At

I am looking for references for a result of the following form. Let $$f_L(x) = l_L \circ a \circ \dotsc \circ a \circ l_1(x),$$ be a feed-forward neural network with fully connected layers $l_i(q)=W_iq+b_i$ (affine functions) and a smooth activation function $a(\cdot)$ that is applied component wise. Here we assume that $f_L$ maps $x\in\mathbb{R}^d \to \mathbb{R}^{d_1}\to \dotsc \to \mathbb{R}^{d_1} \to \mathbb{R}^D$. Let $\mathcal{N}(L, d_1)$ be the class of such functions. Note the Jacobian of $f_L(x)$ with respect to the input $x$ is easy to compute and has a nice recursive formula. For simplicity, if $L=2$, we obtain explicitly $Df_2(x) = W_2 \operatorname{diag}(a'(l_1(x)))W_1$, where $\operatorname{diag}(v)$ is the diagonal matrix formed with vector $v$ as the diagonal and $a'$ is the derivative of the activation function applied component wise.

Question: I am looking for a universal approximation result that says that given a smooth function $f\in C(\mathbb{R}^d, \mathbb{R}^D)$ and $\epsilon>0$ there exists a NN $\hat{f} \in \mathcal{N}(L,d_1)$ such that both $$\|f-\hat{f}\|_\infty <\epsilon$$ and $$ \|Df-D\hat{f}\|_F <\epsilon.$$ The first norm can be interpreted as the uniform norm over any compact subset of $\mathbb{R}^d$. The second norm, a matrix norm, can be the Frobenius norm, say. Does such a result exist already in the literature? In other words (and somewhat loosely!), we know from the UAT that neural networks can arbitrarily approximate functions--can they arbitrarily approximate their derivatives?

1

There are 1 best solutions below

0
On BEST ANSWER

A starting point is the paper by Hornik, Stinchcombe, and White (1990) available here.

The main result is Theorem 3.1: let $G\neq 0$ be a smooth activation function belonging to $S_1^m(\mathbb{R},\lambda)$ for some integer $m\geq0$. Then $\Sigma(G)$ is $m$-uniformly dense on compact on $C_{\downarrow}^\infty(\mathbb{R}^r)$. The metric used for the last space includes derivatives up to order $m$. Definitions are found in the paper. There is a good background section on all the relevant function spaces like Sobolev spaces, $L^p$, etc).