Problem with understanding residual network variance analysis

34 Views Asked by At

I try to understand the analysis of variance and mean in a deep residual network. In this article, on the second page, why we can write Var(x_i^{l+1})=Var(x_i^l)+Var(f_i^l(x_l))? Are they independent or what? Also, i can't understand why the summation from 1 to fan-in in the "unnormalised network" section can be written as it is. Please, give me more details about it, if you can.

https://arxiv.org/pdf/2002.10444.pdf