Computing expected value of loss function for linear regression with dropout

1.2k Views Asked by At

Consider a linear regression problem with input data $X\in R^{n\times d}$, weights $w\in R^{d\times 1}$ and targets $y\in R^{n\times 1}$. Suppose that dropout is applied to the input (with probability $1-p$ of dropping the unit i.e. setting it to 0). Let $R\in R^{n\times d}$ be the dropout mask such that $ R_{ij}\sim\text{Bern}(p)$ is sampled i.i.d. from the Bernoulli distribution.

For a squared error loss function with dropout, we then have: $$L(w)=||y-(X\odot R) w||^2$$ Here $\odot$ means elementwise multiplication between two matrices

Let $\Gamma$ be a diagonal matrix with $\Gamma_{ii}=(X^\top X)_{ii}^{1/2}$. Show that the $\textit{expectation (over $ R$)}$ of the loss function can be rewritten as $ E[L(w)]=|| y-pXw||^2+p(1-p)||\Gamma w||^2$.

So I got the following:

$$E[L(w)] = E[\| y- (X \odot R) w\|^2] = E[(y^T-w^T(X \odot R)^T)(y-(X \odot R)w)]\\ =E[y^Ty-2\sum_{ij}y_iw_jX_{ij}p-2\sum_{ij}y_iw_jX_{ij}p+\sum_{ijk} w_i w_j X_{ki} X_{kj} p^2] = || y-pXw||^2$$

What did I do wrong?

1

There are 1 best solutions below

8
On BEST ANSWER

Let $M = X \odot R$, and we know that \begin{equation} \mathbb{E}[M] = pX, \quad \text{var}~(M)=p(1-p)\Gamma^2. \end{equation} Thus \begin{equation} \begin{aligned} \mathbb{E}[L(w)] &=E[(y^T-w^TM^T)(y-Mw)] = y^Ty - 2y^T\mathbb{E}[M]w +w^T\mathbb{E}[MM^T]w \\ &= y^Ty - 2y^T p Xw + p^2w^TX^TXw + p(1-p)w^T\Gamma^2w \\ &= \|y-pXw\|^2 + p(1-p) \|\Gamma w\|^2. \end{aligned} \end{equation}