Central limit theorem on linear combination of activations in a neural network

42 Views Asked by Bumbble Comm At 25 Mar 2026 - 11:37

While I've read a paper, Deterministic variational inference for robust bayesian neural networks, I have some confusion parts.

On the contrary to a standard neural network framework, the author apply a activation function, followed by linear combination as follows.

$$(1) \quad \quad h^{(l)}=f(a^{(l-1)})$$ $$(2) \quad \quad a^{(l)} = h^{l}W^l+b^l$$

And the author claims that infinite linear combination of $h^l$ converges to Normal Distribution by Central Limit Theorem as below.

$a^{(l)}$ is constructed by linear combination of many distinct elements of $h'$, and in the limit of vanishing correlation between terms in this combination, we can appeal to the central limit theorem (CLT). Under the CLT, for a large enough hidden dimension and for variational distributions with finite first and second moments, elements $a_i$ will be normally distributed regardless of the potentially complicated distribution for $h_j$ induced by $f$

I don't understand why CLT could be applied to linear combination of activations in this case. Would anyone elaborate this one?

Original Q&A

Central limit theorem on linear combination of activations in a neural network

Related Questions in CENTRAL-LIMIT-THEOREM

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions