How do I compute the matrix gradient with respect to a submatrix by applying the chain rule?

294 Views Asked by At

Let $f:\mathbb{R}^{2a \times 2a}\rightarrow \mathbb{R}$ and let $X \in \mathbb{R}^{b \times a}$ and $M \in \mathbb{R}^{a \times b}$, for $a, b \in \mathbb{Z}_{[0,\infty)}$. Define $$Y(X)=\begin{bmatrix}I_a&0_{a \times a}\\MX&I_a\end{bmatrix}\,.$$How do I compute the expression of the gradient $$\nabla_Xf\left(Y(X)\right)$$ in terms of $\nabla_Yf(Y)$? In other words, how can I apply the chain rule in this specific case?

1

There are 1 best solutions below

3
On BEST ANSWER

Let's use $\{E_1, E_2\}$ to denote block matrix analogs of the standard basis vectors $\{e_1, e_2\}$ $$\eqalign{ E_1 &= \begin{bmatrix}I_a\\0\end{bmatrix},\,\,\, E_2 &= \begin{bmatrix}0\\I_a\end{bmatrix} \cr }$$ Then the matrix of interest can be written as $$\eqalign{ Y &= E_1I_aE_1^T + E_2I_aE_2^T + E_2MXE_1^T \cr dY &= E_2M\,dX\,E_1^T \cr }$$ Assume that the gradient of $f$ wrt $Y$ is known to be $G=\frac{\partial f}{\partial Y}$.
We can use this to write the differential of $f$ and then find its gradient wrt $X$ $$\eqalign{ df &= G:dY \cr &= G:E_2M\,dX\,E_1^T \cr &= M^TE_2^TGE_1:dX \cr \frac{\partial f}{\partial X} &= M^TE_2^TGE_1 \cr }$$ where a colon denotes the trace/Frobenius product, i.e. $$A:B={\rm tr}(A^TB)$$ There are lots of rules for rearranging the terms in a Frobenius product which follow from the cyclic properties of the trace.

For example, all of the following are equivalent $$\eqalign{ A:BC &= BC:A \cr &= A^T:(BC)^T \cr &= AC^T:B \cr &= B^TA:C \cr }$$