Interpolation Capability of Deep Neural Networks of bounded height

Question

Interpolation Capability of Deep Neural Networks of bounded height

111 Views Asked by user355356 At 25 Mar 2026 - 7:05

The Universal Approximation Theorem shows that deep neural networks can approximate any function in $C(\mathbb{R}^d,\mathbb{R}^n)$ uniformly on compacts. I'm curious, can the collection of a neural networks with bounded height interpolate any finite set of points?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Indeed, if neural networks are universal, then they can also interpolate finite sets of samples.

Theorem: Let $\varrho: \mathbb{R} \to \mathbb{R}$ be an activation function $\varrho: \mathbb{R} \to \mathbb{R}$ such that DNNs of height $L \in \mathbb N$ are universal. Then for every set of points $(x_i, y_i)_{i=1}^N \subset \mathbb R^d \times \mathbb R$, there exists a neural network $\Phi$ of depth $L$ such that $\Phi(x_i) = y_i$ for all $i = 1, \dots, N$.

Proof. Let $(x_i, y_i)_{i=1}^N \subset K \times \mathbb R \subset \mathbb R^d \times \mathbb R$, where $K$ is compact. By the Urysohn lemma, there exist $N$ continuous functions $(f_i)_{i=1}^N\subset C(\mathbb R^d, \mathbb R)$ such that $f_i(x_j) = \delta_{ij}$, for all $i,j \in \{ 1, \dots, N\}$.

Since the set of invertible matrices is open, there exists an $\epsilon >0$ such that every matrix $(a_{i,j})_{i,j =1}^N$ with $|a_{i,j} - \delta_{i,j}| < \epsilon $, for all $i,j \in \{1, \dots, N\}$, is invertible.

Since, by assumption, we have that DNNs with activation function $\varrho$ are universal, there exist neural networks $(\Phi_{i})_{i=1}^N$ such that $$ |\Phi_{i}(x_j) - f_i(x_j)| = |\Phi_{i}(x_j) - \delta_{ij}| < \epsilon. $$ Let $A = (a_{i,j})_{i,j =1}^N \in \mathbb R^{N \times N}$ be defined by $$ a_{i,j} := \Phi_{i}(x_j). $$ Then $A^{-1}$ exists. We define a new network $$ \Phi : = [y_{1}, \dots, y_N] \cdot A^{-1} \left( \begin{array}{c} \Phi_1 \\ \Phi_2 \\ \vdots \\ \Phi_N\end{array} \right). $$ It holds that $\Phi$ has $L$ layers, because each of the $\Phi_i$ had $L$ layers and we have not applied $\varrho$ again. Per definition, we have that $$ A^{-1} \left( \begin{array}{c} \Phi_1(x_j) \\ \Phi_2(x_j) \\ \vdots \\ \Phi_N(x_j)\end{array} \right) = e_j, $$ where $e_j$ is the $j$'th unit vector. Therefore $\Phi(x_j) = [y_{1}, \dots, y_N] \cdot e_j = y_j$ as desired.

Remark: The result only addresses the case of scalar outputs. However, the same result holds for multivariate outputs by putting networks in parallel.

Interpolation Capability of Deep Neural Networks of bounded height

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions