Expectation of dropout

170 Views Asked by At

I am new to expectation and could definitely use some resources on how to use them with respect to deep learning/reinforcement learning. Upon reading .an article on the math behind dropout we come to an expectation of a derivative. I understand how we achieve the derivative but not the expectation, I am looking for some clarity on their approach and possible resources on the matter.

Given: $$ \frac{\partial E_D}{\partial w_i} = \frac12 (t - \sum_{i=1}^n \delta_iw_iI_i)^2 = -t\delta_iI_i + w_i\delta_i^2I_i^2 + \sum_{j=1,j\ne i}^n\delta_i\delta_jI_iI_jw_i $$

We want to take the expectation of the gradient which is described as:

$$ \Bbb{E}\left[\frac{\partial E_D}{\partial w_i}\right] = -tp_iI_i + w_ip^2_iI_i^2 + w_iVar(\delta_i)I_i^2+\sum_{j=1,j\ne i}^n w_jp_ip_jI_iI_j $$

Here is the article for reference, I am talking about figure (6) https://towardsdatascience.com/simplified-math-behind-dropout-in-deep-learning-6d50f3f47275

1

There are 1 best solutions below

5
On BEST ANSWER

$\delta_i$ is by definition Bernoulli($p_i$) and independent across $i$. So, using linearity of expectation $(E[X+Y]=E[X]+E[Y])$:

1) $E[\delta_i]=p_i$,

2) $E[\delta_i^2] = \mbox{Var}(\delta_i^2)+E[\delta_i]^2 = p_i(1-p_i)+p_i^2,$

3) $E[\delta_i\delta_j]=E[\delta_i]E[\delta_j]=p_ip_j$.