Derivative of function of matrices

603 Views Asked by At

I need help to take a derivative wrt a matrix, I'll much appreciate any help.

Suppose $X \in R^{m\times n}$ and $a,b \in R^{m \times 1}$. Let function $f$ be \begin{equation} f(X)=(a^T X X^Tb -c)^2 \end{equation} where $c$ is a scalar constant. What is $\partial f / \partial X = ?$

My second question is more complex.

Assume the function $f$ now be

\begin{equation} f(X)=(g(X^Ta)^T g(X^Tb) -c)^2 \end{equation}

where $g : R^{n\times 1} \rightarrow R ^{n \times 1}$ is a differentiable function. Again what is $\partial f / \partial X = ?$

3

There are 3 best solutions below

0
On BEST ANSWER

Let me try to answer your second question, since no one else has.

First, I'll assume that your $g$ function is a scalar function applied elementwise, since the result has the same shape as the argument.

I'll also assume that this scalar function has a known derivative $$ g^\prime(s) = \frac {dg(s)} {ds} $$ Next, I'll generalize from the vector arguments in your question, to matrix arguments, and define the symbols $$ \eqalign { g_A &= g(X^T\cdot A) \cr g_B &= g(X^T\cdot B) \cr h &= g_A:g_B - c \cr } $$ Finally, let's denote the Frobenius and Hadamard product between matrices $A,B$ as $(A:B)$ and $(A\circ B)$ respectively.

Now it's just a matter of taking the differential and expanding $$ \eqalign { df &= dh^2 \cr &= 2 h (dh) \cr &= 2 h (g_B:dg_A + g_A:dg_B) \cr &= 2 h (g_B:g^\prime_A\circ d(X^T\cdot A) + g_A:g^\prime_B\circ d(X^T\cdot B)) \cr &= 2 h (g_B\circ g^\prime_A:d(X^T\cdot A) + g_A\circ g^\prime_B:d(X^T\cdot B)) \cr &= 2 h (g_B\circ g^\prime_A\cdot A^T:dX^T + g_A\circ g^\prime_B\cdot B^T:dX^T) \cr &= 2 h (g_B\circ g^\prime_A\cdot A^T + g_A\circ g^\prime_B\cdot B^T) : dX^T \cr &= 2 h (g_B\circ g^\prime_A\cdot A^T + g_A\circ g^\prime_B\cdot B^T)^T : dX \cr } $$ So the derivative is $$ \eqalign { \frac {\partial f} {\partial X} &= 2 h (g_B\circ g^\prime_A\cdot A^T + g_A\circ g^\prime_B\cdot B^T)^T \cr &= 2 h A\cdot(g_B\circ g_A^{\prime})^T + 2 h B\cdot(g_A\circ g_B^{\prime})^T \cr } $$ Your first question uses the identity function, $g(s) = s$, whose derivative is $g^\prime(s) = 1$.

Since a matrix of all ones acts as the identity for the Hadamard product, the derivative reduces to $$ \eqalign { \frac {\partial f} {\partial X} &= 2 h A\cdot(g_B)^T + 2 h B\cdot(g_A)^T \cr &= 2 h A\cdot(X^T\cdot B)^T + 2 h B\cdot(X^T\cdot A)^T \cr &= 2 h A\cdot B^T\cdot X + 2 h B\cdot A^T\cdot X \cr } $$

4
On

Apparently what we are looking for is a differential $Df(X) : \mathbb R^{ m\times n}\to \mathbb R$, which DOES exist, as $f$ is $C^\infty$, and it is realized by the matrix $G=(g_{ij})\in\mathbb R^{m\times n}$, where $$ g_{ij}=\frac{\partial F}{\partial X_{ij}}. $$ Now $$ F(X)=\left(\sum_{i,j,k=1}^n a_iX_{ik}X_{jk}b_j-c\right)^2, $$ and hence $$ \frac{\partial F}{\partial X_{rs}}=2\left(\sum_{i,j,k=1}^n a_iX_{ik}X_{jk}b_j-c\right)\,\left(\sum_{j=1}^n a_rX_{js}b_j+\sum_{i=1}^n a_iX_{is}b_r\right) \\= 2\left(\sum_{i,j,k=1}^n a_iX_{ik}X_{jk}b_j-c\right)\,\left(a_r(X^Tb)_s+b_r(a^TX)_s\right), $$ and finally $$ G=2(a^TXX^Tb-c)\,\big(a(b^TX)+b(a^TX)\big). $$

0
On

Just take the derivative with respect to each element separately and stick them into a matrix.

For instance, if $e_i$ is the $i$--th unit vector then the derivative of $f$ with respect to $X_{ij}$ is

$$2e_i'(ba'X + ab'X) e_j (a'XX'b-c)$$.

So you're right, although it depends on your notation style whether you take the matrix you mention or its transpose as the derivative.

As an aside, it can be preferable to first vectorize a matrix before taking derivatives, especially if higher order derivatives are desired.