Unclear transition in proof of the equivalence between weight decay and Gaussian noise on the inputs

44 Views Asked by Bumbble Comm At 10 May 2026 - 11:39

According to slide 16 of this lecture, in a simple net with a linear output unit directly connected to the inputs, if one adds Gaussian noise to the inputs ($x_i + \epsilon_i$ where $\epsilon_i \sim \mathcal{N}(0, \sigma_i^2)$) this is equivalent to an L2 weight decay penalty. The proof given for this is:

\begin{align} \mathbb{E}[(y^{noisy}-t)^2] &= \mathbb{E}[(y + \underset{i}\sum w_i \epsilon_i -t)^2] \\ &= \mathbb{E}[(y - t) + \underset{i}\sum w_i \epsilon_i)^2] \\ &= (y-t)^2 + \mathbb{E}[2(y-t)\underset{i}\sum w_i\epsilon_i] + \mathbb{E}[(\sum w_i \epsilon_i)^2]\\ &= (y-t)^2 + \mathbb{E}[\underset{i}\sum w_i^2\epsilon_i^2]\\ &= (y-t)^2 + \underset{i}\sum w_i \sigma_i^2, \end{align}

where $t$ is the target output, $w_i$ are the weights and $y^{noisy}$ is the output of the network.

Unfortunately, I cannot understand how to transition between the third and the fourth line of the proof (despite the fact that the slides explicitly mention that this is due to the independence between the $\epsilon_i$ amongst eachother and $\epsilon_i$ and $(y-t)$). Could someone enlighten me as to what property surging from the independence of these variables allows this transition? Thank you!

Original Q&A

Unclear transition in proof of the equivalence between weight decay and Gaussian noise on the inputs

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Related Questions in REGULARIZATION

Trending Questions

Popular # Hahtags

Popular Questions