Derivate of a matrix vector multiplication

70 Views Asked by At

I'm trying to calculate the derivates with respect to W and h for this function $$ f(W, h) = ||\frac{Wh}{||Wh||}-y||^2 $$ W is a matrix, h and y are column wise vectors.

I got $$ dh = 2 - \frac{2y'Wh}{||Wh||} $$ Is it correct? And I'm confused on dW. Can someone help me?

1

There are 1 best solutions below

0
On

I think you are looking for matrix calculus, which as a nice exposition here, by Prof. Barnes.

One way to approach such problems is by components: \begin{align} \frac{\partial}{\partial h_j}f &= \frac{\partial}{\partial h_j}\left|\left|\frac{Wh}{||Wh||_2}-y\,\right|\right|_2^2\\[1mm] &= \frac{\partial}{\partial h_j}\sum_i\left[ ||Wh||_2^{-1}\sum_\ell W_{i\ell}h_\ell - y_i \right]^2\\ &= \sum_i2\left[ ||Wh||_2^{-1}\sum_\ell W_{i\ell}h_\ell - y_i \right]\frac{\partial}{\partial h_j}\left[ ||Wh||_2^{-1}\sum_\ell W_{i\ell}h_\ell - y_i \right]\\ &= \sum_i2\left[ ||Wh||_2^{-1}\sum_\ell W_{i\ell}h_\ell - y_i \right] \left( \left[\sum_\ell W_{i\ell}h_\ell\right] \frac{\partial}{\partial h_j}||Wh||_2^{-1} + ||Wh||_2^{-1}W_{ij} \right)\\ \end{align} Simplifying the inner term: \begin{align} \frac{\partial}{\partial h_j}||Wh||_2^{-1} &= \frac{\partial}{\partial h_j}(h^TW^TWh)^{-1/2}\\ &= \frac{-1}{2}\left(\frac{\partial}{\partial h_j}[h^TW^TWh]\right)^{-3/2} \\ &=\frac{-1}{2}\left(\frac{\partial}{\partial h_j}\left[\sum_s (Wh)_s^2\right]\right)^{-3/2} \\ &=\frac{-1}{2}\left(\left[\sum_s \frac{\partial}{\partial h_j}\left(\sum_t W_{st}h_t\right)^2\right]\right)^{-3/2} \\ &=\frac{-1}{2}\left(\left[\sum_s2\left(\sum_t W_{st}h_t\right) \frac{\partial}{\partial h_j}\left(\sum_t W_{st}h_t\right)\right]\right)^{-3/2} \\ &=\frac{-1}{2}\left(\left[\sum_s2\left(\sum_t W_{st}h_t\right) W_{sj}\right]\right)^{-3/2} \\ \end{align} Similarly, we get: \begin{align} \frac{\partial}{\partial W_{ab}}||Wh||_2^{-1} &=\frac{-1}{2}\left(\left[\sum_s \frac{\partial}{\partial W_{ab}}\left(\sum_t W_{st}h_t\right)^2\right]\right)^{-3/2} \\ &=\frac{-1}{2}\left(\left[\sum_s2\left(\sum_t W_{st}h_t\right) \frac{\partial}{\partial W_{ab}}\left(\sum_t W_{st}h_t\right)\right]\right)^{-3/2} \\ &=\frac{-1}{2}\left(\left[\sum_s2\left(\sum_t W_{st}h_t\right) \frac{\partial W_{sb}}{\partial W_{ab}}h_b\right]\right)^{-3/2} \\ &=\frac{-1}{2}\left(\left[2\left(\sum_t W_{at}h_t\right) h_b\right]\right)^{-3/2} \\ \end{align} And for the matrix derivative: \begin{align} \frac{\partial}{\partial W_{ab}}f &= \frac{\partial}{\partial W_{ab}}\left|\left|\frac{Wh}{||Wh||_2}-y\,\right|\right|_2^2\\[1mm] &= \frac{\partial}{\partial W_{ab}}\sum_i\left[ ||Wh||_2^{-1}\sum_\ell W_{i\ell}h_\ell - y_i \right]^2\\ &= \sum_i2\left[ ||Wh||_2^{-1}\sum_\ell W_{i\ell}h_\ell - y_i \right]\frac{\partial}{\partial W_{ab}}\left[ ||Wh||_2^{-1}\sum_\ell W_{i\ell}h_\ell - y_i \right]\\ &= \sum_i2\left[ ||Wh||_2^{-1}\sum_\ell W_{i\ell}h_\ell - y_i \right] \left( \left[\sum_\ell W_{i\ell}h_\ell\right] \frac{\partial}{\partial W_{ab}}||Wh||_2^{-1} + ||Wh||_2^{-1} h_b\delta_{ia} \right)\\ \end{align} where $\delta_{ia}$ is the Kronecker delta.

So you have the components of your desired derivatives: $$ \frac{\partial f}{\partial h} = \left(\frac{\partial f}{\partial h_1},\ldots,\frac{\partial f}{\partial h_n}\right) $$ $$ \frac{\partial f}{\partial W} = \begin{bmatrix} \frac{\partial f}{\partial W_{11}} & \cdots & \frac{\partial f}{\partial W_{1n}}\\ \vdots &\ddots &\vdots \\ \frac{\partial f}{\partial W_{n1}} & \cdots & \frac{\partial f}{\partial W_{nn}} \end{bmatrix} $$