I'm pretty sure this question has already studied by at least one paper, but I can't figure out where. The question is the following :
Given $l$ layers of $n_l \in L$ neurons, we can build a set of functions $\mathcal{N}(l, \{n_i | 1 \leq i \leq l\}\}$, which are the functions given by any neural feedforward neural network with this amount of layers and neuron (n_1 and n_l are respectively the input layers and output layers).
Those functions are continuous $\mathcal{N}(l, n_i) \subset \mathcal{C}(R^n_1, R^n_l)$, and as the number of layers, or the number of neurons in a layer increase, we have bigger set of functions ($\mathcal{N}(l, n_i) \subset \mathcal{N}(l', m_i), \forall l \leq l', n_i \leq m_i$).
But what kind of functions those $\mathcal{N}(l, n_i)$ (let's say n_0 and n_l are given and fixed) sets are approximating? I would be surprised if this set is dense in the set of continuous functions. I guess it is not the case, and then ask what is the smallest human-named set containing those functions. Are they dense in that set? If no, which functions are missing?
Once we know that, what probability law does a uniform law on the weights ''induce'' on this function set?
N.b. : This question is related to computer science, but since I'm looking for functional space, and probabilities, I thought it's better to ask on math exchange.
Edit : Just noticed I'm talking about density but didn't said which topology I'm using. Let's say we are interested to this answer for both $L^2$ and infinite norms.
Your intuition is right on the money. It has been proven that FFNN can approximate any continuous function over a compact subset of $R^n$.
https://en.wikipedia.org/wiki/Universal_approximation_theorem