We assume we have an interval $I=[a,b]$. We define $C(I)$ to be the set of continuous functions on $I$. We further define the set of one-hidden layer neural networks $$NN(H,\theta)=\left\{ f_{\theta}=\sum_{i=1}^{H}a_{i}\phi(w_{i}\cdot x+b_{i}) |a_{i},w_{i},b_{i}\in\mathbb{R}\right\}$$ with activation function $\phi$ which could be a sigmoid function or $\tanh$ or ReLU and we define the parameter vector $\theta=(a,w,b)\in \mathbb{R}^{3H}$. We define the activation points as $K=\{x_1,\dots,x_{H}|w_{i} \cdot x_{i}+b_{i}=0 \quad \forall i \in 1,\dots ,H\}$ Let $\|\|$ be the supremum norm. The approximation error is given by $$d(f_{\theta},f)=\sup_{f \in C(I)}\inf_{\theta \in \mathbb{R}^{3H}}\|f_{\theta}-f\|$$ and can e.g. be bounded by $d(f_{\theta},f) \leq \frac{5}{2}\omega(f,\frac{b-a}{H})=\varepsilon$ (2) where $\omega$ is the modulus of continuity. There are other bounds.
My question is the following: How is $\max(K),\min(K)$ related to $a$ and $b$ and $\varepsilon$?
E.g. looking at the following picture.

We can approximate the constant function with a ReLU network and place the activation point as far away form $a$ as we want. But this is very pathological, in general we need the nonlinearity which comes with the activation point to fit the nonlinearity of the function. Hence I would assume that $|\min(K)-a| \approx \frac{\varepsilon}{H}$. But this is not a proof.