Partial derivative with matrices

173 Views Asked by At

I have reforumulated my problem of computing some quantities $\mathbf{a}\in R^{m}$ from $\mathbf{b}\in R^{n}$ in a matricial form:

$$\mathbf{b} = (C\odot(\mathbf{1}_{n}\cdot \mathbf{a}^{T}))\cdot \mathbf{1}_{m}$$

where $\mathbf{C}\in R^{n\times m}$ and $\odot$ is the Hadamard (element-wise) product.

Now I'd like to compute the derivatives for my quantities in $\mathbf{a}$ (image to use them in an update rule for a gradient descent optimization step) and I'd like to derive them from my matrix formulation, so what I am trying to do is to compute $\frac{\partial \mathbf{b}}{\partial \mathbf{a}}$ (even though abusing notation).

Following the Matrix Cookbook I am doing this:

$$\frac{\partial \mathbf{b}}{\partial \mathbf{a}} = \partial(C\odot(\mathbf{1}_{n}\cdot \mathbf{a}^{T}))\cdot \mathbf{1}_{m} = \\ =\mathbf{C}\odot \partial(\mathbf{1}_{n}\cdot \mathbf{a}^{T})\cdot \mathbf{1}_{m} = \\ = \mathbf{C}\odot (\mathbf{1}_{n}\cdot \mathbf{1}_{m}^{T})\cdot \mathbf{1}_{m} = \\ = \mathbf{C}\cdot \mathbf{1}_{m}$$

But this does not feel right at all since is in $R^{n}$ while it shall be in $R^{m}$ like $\mathbf{a}$ is.

In the end I would expect it to be $\mathbf{C}$ since $\mathbf{b}=\mathbf{C}\cdot\mathbf{a}$ but still... I guess that not only I am making some mistake but that I am missing something theoretically.

1

There are 1 best solutions below

1
On

Your intuition is indeed correct    $b = C\cdot a$.

To prove it, I'll need the 3rd order tensor $\beta_{ijk}$ whose components are unity whenever $i=j=k$ and zero otherwise.

Using $\beta$ you can express Hadamard products as: $$ \eqalign { a\circ b &= a\cdot\beta\cdot b \cr C\circ(a\cdot b') &= a\cdot\beta\cdot C\cdot\beta\cdot b \cr } $$ One more useful property is that $I = \beta\cdot 1$

Now we're ready to attack your problem $$ \eqalign { (C\circ(1\cdot a'))\cdot 1 &= (1\cdot\beta\cdot C\cdot\beta\cdot a)\cdot 1 \cr &= (I\cdot C\cdot\beta\cdot a)\cdot 1 \cr &= (C\cdot\beta\cdot a)\cdot 1 \cr &= C\cdot\beta: (a\cdot 1') \cr &= C\cdot\beta: (1\cdot a') \cr &= (C\cdot\beta\cdot 1)\cdot a \cr &= (C\cdot I)\cdot a \cr &= C\cdot a \cr } $$ Those middle steps are allowed because $\beta$ has a valence of 3, and is symmetric in all of its indices.

Another way to think of $\beta$ is in terms of diagonal operations, i.e. converting a vector into a diagonal matrix or converting the diagonal of a matrix into a vector: $$ \eqalign { \beta\cdot a &= {\rm Diag}(a) = A \cr \beta:B &= {\rm diag}(B) = b \cr } $$ As for the derivative
$$ \eqalign { b &= C\cdot a \cr \frac {\partial b} {\partial a^T} &= C \cr } $$