Weight Initialization in an ANN - Why should you initialize with Mean $0$ and SD $1/\sqrt{n}$

23 Views Asked by At

I recently read that it is good to choose a neural network with weights initialized as Gaussian random variables with mean 0 and standard deviation $\frac{1}{\sqrt{n}}$ where $n$ is the number of inputs to a neuron. Assuming we use the sigmoid activation function with Gradient Descent and Backpropagation, why is this true?