Can every piecewise linear function be exactly realized as a neural network?

Question

Can every piecewise linear function be exactly realized as a neural network?

1.6k Views Asked by Bumbble Comm At 26 Mar 2026 - 6:58

Can every continuous piecewise linear function $[-1,1]^k \rightarrow \mathbb{R}^n$ be written as a composition of the following building blocks:

Affine map: $x \mapsto Ax + b$ for some matrix $A$ and vector $b$
Relu activation: $(x_1, x_2, ...) \mapsto (\max(0, x_1), \max(0, x_2),...)$

If so, how many composition factors are needed? Can every such function be represented by a network with "only one hidden layer":

$$ \text{affine} \circ \text{relu} \circ \text{affine} \circ \text{relu} \circ \text{affine} $$

By piecewise linear, I mean that there exists decomposition of the domain $[-1,1]^k$ into finitely many polytopes, such that the restriction of the function to each polytope is affine.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 15 Jun 2021 - 8:57

I call the set of all functions $\mathbb{R}^k \rightarrow \mathbb{R}^n$ that are a composition of relus and affine maps representable We want to show that the set of (restrictions to $[-1,1]^k$ of) representable functions contains the set of all piecewise linear functions. (The other direction is easy).

Let us show that the space of representable functions is closed under a couple of useful operations. Later we try to show that these operations suffice to generate all piecewise linear functions.

Every affine function is representable
The copy operation $f:\mathbb{R}^n \rightarrow \mathbb{R}^{2n}$ given by $x \mapsto (x, x)$ is representable
If $f:\mathbb{R}^{k_1} \rightarrow \mathbb{R}^{n_1}$ and $f:\mathbb{R}^{k_2} \rightarrow \mathbb{R}^{n_2}$ are both representable, so is their cartesian product $(x,y) \mapsto (f(x), g(y))$
If $f,g: \mathbb{R}^k \rightarrow \mathbb{R}^n$ are representable so is their sum $f+g: \mathbb{R}^{k} \rightarrow \mathbb{R}^n$
$f:\mathbb{R}^k \rightarrow \mathbb{R}^n$ is representable if and only if each of its coordinate functions $f_i:\mathbb{R}^k \rightarrow \mathbb{R}$ is representable.
Halfspace projections are representable. More precisely let $A \in \mathbb{R}^{1 \times k}$ and $b \in \mathbb{R}$. Denote by $H$ the set of all points that satisfy $Ax <= b$. Then $proj(x) = argmin_{h \in H} ||x-h||$ is representable.

Proof: By composing with affine maps, we can reduce to the case $A = (1,0,0,...)$ and $b=0$. Then the cartesian product function $(relu, id, id, id)$ does the job.

If $f,g$ are representable and coincide along a hyperplane, then so is the function obtained by glueing them together along that hyperplane. More precisely: Let $f,g:\mathbb{R}^n \rightarrow \mathbb{R}^k$ and $A \in \mathbb{R}^{1 \times k}$ and $b \in \mathbb{R}$. Assume $f(x) = g(x)$ for all $x$ that satisfy $(Ax = b)$. Then the piecewise linear function $r$ given by:

$$ \begin{align} r(x) &= f(x) \text{ for } Ax <= b \\ r(x) &= g(x) \text{ for } Ax >= b \end{align} $$

is representable.

Proof: As a warm up we assume that $f,g$ vanish on the $(Ax=b)$ hyperplane. Then have $r(x) = f(proj_+(x)) + g(proj_-(x))$ where $proj_-$ and $proj_+$ are the half space projections. This is a composition of representable functions.

In case $f,g$ do not vanish on the hyperplane we can decompose them as $f = f_0 + rest$ and $g = g_0 + rest$ where $rest(x) := f(proj_-(proj_+(x)))$. Now $f_0$ and $g_0$ vanish and we have $r(x) = f_0(x) + g_0(x) + rest(x)$.

To see that every piecewise linear function can be build from these operations, let $f$ be such a function. Then there exists a finite collection of hyperplanes such that the domain is covered by halfspace intersections on which $f$ is affine. I think we can use induction on the glue operation rule to represent $f$. But I have trouble spelling this out rigorously. It is a gap in the proof.

**Bumbble Comm** · Accepted Answer

Long version of my short comment:

First of all, not every piecewise affine linear function can be build by a ReLU neural network with only one hidden layer. The reason is that a compactly supported piecewise affine function, such as, $$ \mathbb{R}^d \ni x \mapsto \max\{0, 1 - \max_{i=1, \dots, d} |x_i| \} $$ cannot be represented by sums of ReLUs. The simple reason is that this function is smooth outside of a compact domain whereas a sum of ReLUs is either affine linear or has at least one line along which it is not smooth. (This is of course something one would need to prove in more detail. A proof can be found in Theorem 4.1 of https://arxiv.org/pdf/1807.03973.pdf.)

On the other hand, it was shown in https://arxiv.org/pdf/1807.03973.pdf that deep ReLU neural networks can represent linear finite elements. This is because one can write these hat functions as a combination of max and min operations. I can only do a worse job than the authors themselves in explaining how this is done. Their paper also has a lot of nice illustrations. Therefore I think it is best to just refer to Chapter 3 of https://arxiv.org/pdf/1807.03973.pdf.

From the construction of hat functions, it follows essentially directly that also all continuous piecewise linear functions can be represented by ReLU neural networks since every such function is a sum of hat functions. This is Theorem 5.2 of the work cited above.

Can every piecewise linear function be exactly realized as a neural network?

There are 2 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORKS

Related Questions in PIECEWISE-CONTINUITY

Trending Questions

Popular # Hahtags

Popular Questions