According to slide 16 of this lecture, in a simple net with a linear output unit directly connected to the inputs, if one adds Gaussian noise to the inputs ($x_i + \epsilon_i$ where $\epsilon_i \sim \mathcal{N}(0, \sigma_i^2)$) this is equivalent to an L2 weight decay penalty. The proof given for this is:
\begin{align} \mathbb{E}[(y^{noisy}-t)^2] &= \mathbb{E}[(y + \underset{i}\sum w_i \epsilon_i -t)^2] \\ &= \mathbb{E}[(y - t) + \underset{i}\sum w_i \epsilon_i)^2] \\ &= (y-t)^2 + \mathbb{E}[2(y-t)\underset{i}\sum w_i\epsilon_i] + \mathbb{E}[(\sum w_i \epsilon_i)^2]\\ &= (y-t)^2 + \mathbb{E}[\underset{i}\sum w_i^2\epsilon_i^2]\\ &= (y-t)^2 + \underset{i}\sum w_i \sigma_i^2, \end{align}
where $t$ is the target output, $w_i$ are the weights and $y^{noisy}$ is the output of the network.
Unfortunately, I cannot understand how to transition between the third and the fourth line of the proof (despite the fact that the slides explicitly mention that this is due to the independence between the $\epsilon_i$ amongst eachother and $\epsilon_i$ and $(y-t)$). Could someone enlighten me as to what property surging from the independence of these variables allows this transition? Thank you!