Derivative of Hadamard Product of Hadamard with Vector

118 Views Asked by At

I am stuck at computing the derivative of the following hadamard product:

$$ \frac{\partial}{\partial W} (\vec{a}*W) \circ (\vec{b}*W) $$

W is a random initialized matrix. I read the post to Derivative of Hadamard product but I still don't understand how to perform the calculation. Do I just have to use the product rule here?

2

There are 2 best solutions below

2
On

If we denote $\vec{a}*W$ as the usual product of the vector $\vec{a}$ (writen as a row matrix) and the matrix $W=[w_1,\,w_2,\,\ldots,\, w_n]$, in which $w_i$ is a column vector. Then we can see that $$ (\vec{a}*W)=[\vec{a}*w_1,\,\vec{a}*w_2,\,\ldots,\, \vec{a}*w_n],$$ in which $a*w_i$ is a number. It follows that $$f(W)= (\vec{a}*W) \circ (\vec{b}*W)=[(\vec{a}*w_1)(\vec{b}*w_1),\,(\vec{a}*w_2)(\vec{b}*w_2),\,\ldots,\,(\vec{a}*w_n)(\vec{b}*w_n)],$$ is a row matrix (or a vector). We can see that $$\frac{\partial (\vec{a}*w_j)(\vec{b}*w_j)}{\partial w_i} =\left\{\begin{array}{rr}(\vec{a}*w_j)\vec{b}+(\vec{b}*w_j)\vec{a},&\quad i=j\\0,&\quad i\neq j\end{array}\right. ,$$ and you can see $$\frac{\partial (\vec{a}*W) \circ (\vec{b}*W)}{\partial W}$$ as the Jacobian matrix of $f(W)$, for instance, when you identify $W$ and $vec(W)$ (the vectorization of $W$).

You can also see that $$f(W+H)=f(W)+\left((\vec{a}*H) \circ (\vec{b}*W)+(\vec{a}*W) \circ (\vec{b}*H)\right)+(\vec{a}*H) \circ (\vec{b}*H),$$ which implies that $$\frac{\partial (\vec{a}*W) \circ (\vec{b}*W)}{\partial W}H=(\vec{a}*H) \circ (\vec{b}*W)+(\vec{a}*W) \circ (\vec{b}*H),$$ when you see $$\frac{\partial (\vec{a}*W) \circ (\vec{b}*W)}{\partial W}$$ as a linear transformation.

0
On

$ \def\bbR#1{{\mathbb R}^{#1}} \def\e{\varepsilon}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Assume that $W\in\bbR{m\times n}$ and that $\{e_k,\e_k\}$ are the standard basis vectors for $\{\bbR{m},\bbR{n}\}$ respectively. The dimensions of the other vectors in the problem are $$\eqalign{ a,b\in\bbR{m} \qquad \LR{W^Ta}\in\bbR{n} }$$

Now consider the gradient of a small matrix wrt to one of its components. $$\eqalign{ W &= \m{p&q&r\\x&y&z} \in\bbR{2\times 3} \\ \grad{W}{W_{21}} &= \grad{W}{x} = \m{0&0&0\\1&0&0} = \m{0\\1}\bullet\m{1&0&0} = e_2\e_1^T \\ }$$ It shouldn't be surprising that in the general case $$\eqalign{ \grad{W}{W_{ij}} = e_i\e_j^T \qiq \grad{W^T}{W_{ij}} = \e_j e_i^T \\ }$$ This result can be applied to differentiate (the transpose of) the current problem. $$\eqalign{ f &= \c{W^T}b\circ W^Ta &=\; \c{W^T}a\circ W^Tb \\ \grad{f}{W_{ij}} &= \c{\e_je_i^T}b\circ W^Ta \;&+\;\c{\e_je_i^T}a \circ W^Tb \\ }$$ Recall the trick for replacing a Hadamard product with a diagonal matrix $$\eqalign{ a\circ b = \Diag{a}b }$$ and apply it to the gradient expression $$\eqalign{ \grad{f}{W_{ij}} &= \Diag{\e_j}\,W^T\!\LR{ab^T+ba^T}e_i \\ }$$ The final step is to isolate the $k^{th}$ component of the gradient $$\eqalign{ \grad{f_k}{W_{ij}} &= \e_k^T\gradLR{f}{W_{ij}} &= \e_k^T\Diag{\e_j}\,W^T\!\LR{ab^T+ba^T}e_i \\ }$$ Since this is a scalar, it can be transposed yielding the nice expression $$\eqalign{ \grad{f_k}{W_{ij}} &= e_i^T\LR{ab^T+ba^T}W\,\Diag{\e_j}\,\e_k \\ }$$