The classic Universal Approximation Theorem states that a continuous function $f:\mathbb R^m\to\mathbb R$ can be approximated by a neural network of one hidden layer with nonpolynomial activation function $\sigma:\mathbb R\to\mathbb R$, that is, for every $\varepsilon>0$, $M>0$ and $K\subset\mathbb R^m$ compact there exists $k\in\mathbb N$ and $A_1,\dots, A_k\in\mathbb R^{m}$, and $W,b\in\mathbb R^k$ such that \begin{align} \sup_{x\in K}\Big|\sum_{i=1}^k W_i\sigma(A_i\cdot x_i+b_i)-f(x)\Big|<\varepsilon. \end{align}
I was wondering weather there exists a theorem for bounded parameters, e.g., for every $M>0$, $K\subset\mathbb R^m$ compact and $\varepsilon>0$ there exists $k\in\mathbb N$ and and $A_1,\dots, A_k\in\mathbb R^{m}$ and $W,b\in\mathbb R^k$ with $|A|,|W|,|b|\le M$ such that $\sup_{x\in K}\Big|\sum_{i=1}^k W_i\sigma(A_ix_i+b_i)-f(x)\Big|<\varepsilon$ holds.
Are there any references on this?