On a lower bound for the Lipschitz constant of a neural network

551 Views Asked by user9413 At 05 Apr 2026 - 11:07

There is a conjecture which states that the Lipschitz constant of a two-layer neural network is lower bounded by $c\sqrt{\frac{n}{k}}$ where $c$ is some constant, $n$ is the number of data points and $k$ is the number of neurons in the hidden layer.

More formally, consider a family of functions $\mathcal{F}_k(x)$ parameterized by $k$ which is the number of neurons. Let $x \in \mathbb{R}^d$, $a_l,b_l \in \mathbb{R}$, $w_l \in \mathbb{R}^d$ and $\varphi$ a Lipschitz continuous function. If $f \in \mathcal{F}_k(x)$, the $f$ is a function defined by $$x \mapsto \sum_{l = 1}^l a_l \varphi(\langle w_l, x \rangle + b_l)$$

It has been shown that functions from this class can be used to fit any given data set. This is to say, given $n$ data points $(x_i,y_i)_{i \in [n]} \in (\mathbb{R}^d \times \mathbb{R})^n$, one can find $f \in \mathcal{F}_k(\varphi)$ such that $$f(x_i)=y_i,\ \forall i \in [n]$$

The conjecture gives a lower bound to the Lipschitz constant for the function $f$ given $n$ data points in $\mathbb{R}^d$ and given that neural net fits the data exactly, $f \in \mathcal{F}_k$. The conjecture says: $$\text{Lip}(f)=\sup_{x_1 \ne x_2}\frac{|f(x_1)-f(x_2)|}{\Vert x_1-x_2 \Vert} \ge c\sqrt{ \frac{n}{k} }$$ for some constant $c$.

The input data ($x_i$) is assumed to lie on the unit sphere $\mathbb{S}^{d-1}$ in $\mathbb{R}^d$ and the output data $y_i$ is uniformly distributed over $\{-1,1\}$.

What I am claiming is that I have a "counterexample" to this lower bound. I'm not sure if I should call it a counterexample? That's why I'm posting it here. What I have found is that when the data comes from $\mathbb{R}^3$ then we have a lower bound which is on the order of $\sqrt{n}$ that doesn't depend on the function at all, so could not be improved based on any adjustment to the parameters of $f$.

Proof

Let us consider any 2 out of $n$ data points $(x_1,y_1),(x_2,y_2)$. Since $y_1,y_2$ are uniformly distributed uniformly on -1 and 1. Then without loss of generality, the only case that is interesting to consider is when $-1=y_1=f(x_1)$ and $1=y_2=f(x_2)$. We look putting a lower bound on the following fraction. $$\frac{|f(x_1)-f(x_2)|}{\Vert x_1- x_2 \Vert}=\frac{|-1 - 1|}{\Vert x_1 - x_2 \Vert}= \frac{2}{\Vert x_1 - x_2 \Vert}$$

Now we look at how we can lower bound this based on the distribution of $x_1$ and $x_2$ on $\mathbb{S}^2$.

Lemma 1: (Fejes Tóth) For any 2 points out of $n$ points embedded in the unit sphere $\mathbb{S}^2$ in $\mathbb{R}^3$ we must have 2 points with a distance $p$ such that $$p \le \sqrt{4-\csc^2\left( \frac{n \pi}{6(n-2)} \right)}$$

We can directly apply this result to the previous expression, to obtain: $$\sup_{x_1 \ne x_2 \in \mathbb{S}^2} \frac{2}{\Vert x_1- x_2 \Vert} \ge \frac{2}{\sqrt{4-\csc^2\left( \frac{n \pi}{6(n-2)} \right)}}$$

Lemma 2: $4-\csc^2\left( \frac{n \pi}{6(n-2)} \right)$ in on the order of $1/n$. See proof here.

Now, using this in our lower bound, we have the final result. $$\sup_{x_1 \ne x_2 \in \mathbb{S}^2} \frac{|f(x_1)-f(x_2)|}{\Vert x_1 - x_2 \Vert} \ge \frac{2}{\sqrt{ \mathcal{O}(\frac{1}{n})}} = \mathcal{O}(\sqrt{n})$$

I find this proof to be pretty convincing for data on the unit sphere in 3 dimensions. What does this means with regards to the conjecture? The conjecture states that it should be possible to increase the number of neurons $k$ in the funtion $f$, thereby lowering this lower bound. What I have shown here is that no matter how you adjust $f$ you cannot lower the lower bound of $\sqrt{n}$ for data in 3 dimensions.

Can the conjecture still hold true somehow or is this a real "counter-example" and showing that is it wrong? Or maybe the fact is that I am wrong. Please let me know what you think.

Original Q&A

On a lower bound for the Lipschitz constant of a neural network

Related Questions in GEOMETRY

Related Questions in ANALYSIS

Related Questions in UPPER-LOWER-BOUNDS

Trending Questions

Popular # Hahtags

Popular Questions