Matrix derivative formula using the matrix chain rule

3k Views Asked by At

Let $X \in \mathbb{C}^{m \times n}$ be a matrix. Let $F(X) \in \mathbb{C}^{m \times m}$ be a matrix, function of $X$, e.g. $F(X) = I_m + X X^{\dagger}$, where $^\dagger$ means conjugate-transpose and $I_m$ is the identity matrix of dimension $m$. Finally, let $\mathbf{g}(X)$ be a (column-)vector-valued function of $X$, e.g. $\mathbf{g}(X) = u - Xv$, with $u,v$ column-vectors of appropriate dimensions. Then, $$ Q(X) = \mathbf{g}(X)^\dagger F(X) \mathbf{g}(X) $$ is clearly a scalar. What I want to find is a formula for $$ \frac{\partial \mathbf{g}(X)^\dagger F(X) \mathbf{g}(X)}{\partial X} = \ ? $$


Edit: By Leibniz's rule, $$ \frac{\partial Q(X)}{\partial X} = \frac{\partial \mathbf{g}^{\dagger}(X)}{\partial X} F(X) \mathbf{g}(X) + \mathbf{g}^{\dagger}(X) \frac{\partial F(X)}{\partial X} \mathbf{g}(X) + \mathbf{g}^{\dagger}(X) F(X) \frac{\partial \mathbf{g}(X)}{\partial X} $$

3

There are 3 best solutions below

10
On BEST ANSWER

To begin with, as discussed in the comments, one should understand what $dX$ in the denominator means. The space of matrices is a vector space, and so, all maps in question are multi-variable maps. Hence, every map from the space of matrices to another space has a differential which can be thought of as a bunch of partial derivatives. In other words, describing the differential of such a map is equivalent to specifying all the partial derivatives.

So, let $e_i$ be a basis of the space of matrices, and let $\frac{\partial}{\partial x^i}$ denote the directional derivative in the $e_i$ direction. By the Leibniz rule,$$\frac{\partial}{\partial x^i}(f_1\cdot\ldots\cdot f_k)=\frac{\partial f_1}{\partial x^i}f_2\ldots f_k+\ldots+f_1\ldots f_{k-1}\frac{\partial f_k}{\partial x^i}.$$ Note that if the $f$'s are matrix-valued (and they are in your example), then you can't change the order in the above equation, as $AB\neq BA$ for general matrices. Taking transpose and/or conjugation commutes with differentiating, and so, transpose and $\dagger$ simply carry through.

0
On

First, let's find the differentials of the intermediate variables $$\eqalign{ g &= (u-Xv) &\implies dg = -dX\,v\cr F &= I+XX^\dagger &\implies dF = dX\,X^\dagger+X\,dX^\dagger \cr }$$ Then write the function in terms of the double-contraction product, i.e. $$A:B={\rm tr}(A^TB)$$ and find its differential $$\eqalign{ Q &= F:g^*g^T \cr dQ &= (g^*g^T):dF + F:d(g^*g^T) \cr &= (g^*g^T):(dX\,X^\dagger+X\,dX^\dagger) + F:(dg^*\,g^T+g^*\,dg^T) \cr &= (g^*g^T):(dX\,X^\dagger+X\,dX^\dagger) - F:(dX^*\,v^*g^T + g^*v^T\,dX^T) \cr &= g^*g^TX^*:dX + X^Tg^*g^T:dX^\dagger - Fgv^\dagger:dX^* - vg^\dagger F:dX^T \cr &= g^*g^TX^*:dX + X^Tg^*g^T:dX^\dagger - v^*g^TF^T:dX^\dagger - F^Tg^*v^T:dX \cr &= (g^*g^TX^* - F^Tg^*v^T):dX + (X^Tg^*g^T - v^*g^TF^T):dX^\dagger \cr }$$ Treating $X$ and $X^\dagger$ as independent variables, we obtain the gradient with respect to each $$\eqalign{ \frac{\partial Q}{\partial X} &= g^*g^TX^* - F^Tg^*v^T \cr\cr \frac{\partial Q}{\partial X^\dagger} &= X^Tg^*g^T - v^*g^TF^T \cr\cr }$$

0
On

There is a nice answer in Seber(08), page 360, result 17.25, for differentiation by vector.

Suppose y=w'Az (scalar) where: w(mX1), A(mxn), z (nx1) and all are functions of vector x. consider: "kp"=kroneker product, ' the transpose. ∂y/∂x'. Note that y=vec(y)=(z'"kp"w')vec(A)=(z"kp"w)'vec(A)=[vec(wz')']'vec(A). Since y=y' so w'Az=z'A'w, Hence ∂y/∂x'=z'A'∂w/∂x'+ [vec(wz')]'∂vec(A)/∂x'+ w'A∂z/∂x'"

SEBER, G. A. F. A Matrix Handbook for Statisticians. Wiley, 2008.