Derivative of matrix-vector products w.r.t. the matrix

542 Views Asked by At

Given the function $$F(X,Y,Z) = \alpha^TXYZ$$ in which $X, Y, Z $ are matrices of size $n \times n$ and $\alpha$ is a vector of size $n \times 1$, how to compute the derivative of $F$ with respect to $Y$?

Actually I found some related questions but did not help.

Edit: if the function is of the form: $F(X,Y,Z) = \alpha^TXYZ\beta$, then based on the Matrix Cookbook, derivative is : $f' = (\alpha^T X)^T (Z\beta)^T$, but if there is no $\beta$, then the dimensions do not match.

Thank you,

2

There are 2 best solutions below

4
On BEST ANSWER

Let ${\mathcal E}$ be the 4th-order tensor with components $$\eqalign{ {\mathcal E}_{ijkl} &= \delta_{ik}\,\delta_{jl} \cr }$$ Using this tensor, we can calculate the differential and gradient of the function as $$\eqalign{ f &= a^TXYZ \cr \cr df &= a^T(X\,dY\,Z) \cr &= a^T(X\,{\mathcal E}\,Z^T):dY \cr \cr \frac{\partial f}{\partial Y}&= a^TX\,{\mathcal E}\,Z^T \cr }$$ As expected, the gradient of a vector wrt a matrix is a 3rd-order tensor.

If you are unable to work with tensors, you can vectorize the differential to obtain $$\eqalign{ {\rm vec}(df) &= {\rm vec}(a^TX\,dY\,Z) \cr df &= (Z^T\otimes a^TX)\,{\rm vec}(dY) \cr &= (Z^T\otimes a^TX)\,dy \cr \cr \frac{\partial f}{\partial y}&= Z^T\otimes a^TX \cr }$$ which is an ordinary matrix quantity.

This is equivalent to the previous result, if you swap the order of the factors and replace the kronecker product symbol with the ${\mathcal E}$ tensor.

1
On

Let

$$\rm f (X, Y, Z) := a^{\top} X Y Z$$

Hence,

$$\frac{\mathrm f (\mathrm X, \mathrm Y + h \mathrm V, \mathrm Z) - \mathrm f (\mathrm X, \mathrm Y, \mathrm Z)}{h} = \rm a^{\top} X V Z$$

Vectorizing,

$$\rm \mbox{vec} (a^{\top} X V Z) = \left( \color{blue}{Z^{\top} \otimes a^{\top} X} \right) \mbox{vec} (V)$$

where $\rm Z^{\top} \otimes a^{\top} X$ is the Jacobian matrix.