Derivative of autoencoder

59 Views Asked by At

Problem

$$\nabla_{\mathbf{W}} \mathcal{L}(\mathbf{W})=\frac{1}{2}\Vert \mathbf{W}^T\mathbf{Wx} - \mathbf{x}\Vert_2 ^2$$ where $\mathbf{W} \in \mathbb{R}^{m\times n}\ (m < n)$.

What I Have Done

The gradient of $\mathcal{L}$ is influenced by both $\mathbf{W}$ and $\mathbf{W}^T$. If I could calculate them individually, the shape of them are different ($m\times n$ and $n\times m$), I do not know how to merge two gradients together.

1

There are 1 best solutions below

1
On

Define two new variables and their differentials. $$\eqalign{ Y &= (W^TW-I) &\implies dY &= dW^TW+W^TdW \cr z &= Yx &\implies dz &= dY\,x \cr }$$ Write the cost function in terms of these new variables. Then find its differential and gradient. $$\eqalign{ {\mathcal L} &= \tfrac{1}{2}z:z \cr \cr d{\mathcal L} &= z:dz \cr &= z:dY\,x \cr &= zx^T:(W^TdW+dW^TW) \cr &= (zx^T+xz^T):W^TdW \cr &= W(zx^T+xz^T):dW \cr \cr \frac{\partial{\mathcal L}}{\partial W} &= W(zx^T+xz^T) \cr &= WYxx^T + Wxx^TY \cr &= W(W^TW-I)xx^T + Wxx^T(W^TW-I) \cr\cr }$$ NB:  In some of the steps above, a colon is used to represent the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB)$$