Matrix Derivative of of $g(Ax)$ with respect to A

60 Views Asked by At

Consider the function $g: \mathbb{R}^n \to \mathbb{R}$. We want to find the matrix derivative that is \begin{align} \frac{\partial g(Ax) }{\partial A} \end{align} where $A$ is an $n \times n$ matrix.

This is a question about chain rule for the matrix derivative. I did a two by two example and I think the answer is

\begin{align} \frac{\partial g(Ax) }{\partial A} = g'(Ax)x^T \end{align} where $g'(x) = \nabla_x g(x)$. By the way, everything is column vector convention.

2

There are 2 best solutions below

0
On BEST ANSWER

We can compute the derivative directly using the chain rule: \begin{align} \frac{\partial}{\partial a_{ij}}g(Ax) &= \sum_{k}g_{y_k}(Ax)\frac{\partial}{\partial a_{ij}}\sum_{\ell}a_{k\ell}x_{\ell} \\ &= \sum_{k}g_{y_k}(Ax)\delta_{ik}x_{j} \\ &= g_{y_i}(Ax)x_j. \end{align} Alternatively, let $f(A) = g(Ax)$. Then by the chain rule, $$Df(A)H = Dg(Ax)Hx$$ so $$\frac{\partial}{\partial a_{ij}}f(A) = Df(A)e_ie_j^T = Dg(Ax)e_ie_j^Tx = g_{y_i}(Ax)x_j.$$ So the matrix of partial derivatives is $\nabla g(Ax)x^T$.

0
On

You can also do this without coordinates. Notice that $dg$ is a scalar-valued differential, and we can write

$$ d\left(g(Ax)\right) = g’(Ax)dA\,x = \left\langle \nabla g(Ax)x^\intercal, dA\right\rangle. $$

This shows that $\partial g(Ax)/\partial A = \nabla g(Ax)x^\intercal,$ and all we had to do was rearrange a scalar product.