Chain Rule for Matrix Valued Functions

134 Views Asked by At

Suppose I have three matrix-valued functions $M_1, M_2, M_3 \colon \mathbb{R}^n \to \mathbb{R}^{n\times n}$ which are a function of $x\in \mathbb{R}^n$. Let $y \in \mathbb{R}^n$.

Let $f \colon \mathbb{R}^n \times \mathbb{R}^n \to \mathbb{R}^n$ be given by \begin{equation} f(x,y) = M_1(x)M_2(x)M_3(x)y. \end{equation} How do I compute the gradient of $f$ w.r.t the variable $x$, $\frac{\partial}{\partial x}f(x,y)$, in terms of the gradients of $M_1$,$M_2$ and $M_3$?

My tensor calculus is a bit weak, so a detailed explanation would be much appreciated!

1

There are 1 best solutions below

0
On

$ \def\e{\varepsilon} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\g#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\GLR#1#2{\LR{\g{#1}{#2}}} $Let $\e_k$ denote the $k^{th}$ cartesian basis vector, then the $k^{th}$ row of the $M_1$ is $$\eqalign{ m_k^T = \e_k^TM_1 \qiq m_k = M_1^T\e_k \quad \big({\rm as\:column\;vector}\big) \qquad\qquad }$$ Use index notation to calculate the desired gradient $$\eqalign{ f &= M_1M_2M_3y \\ f_k &= m_k^TM_2M_3y \\ \g{f_k}{x_i} &= \GLR{m_k^T}{x_i}M_2M_3y + m_k^T\GLR{M_2}{x_i}M_3y + m_k^TM_2\GLR{M_3}{x_i}y \\ &= \LR{M_2M_3y}^T\GLR{m_k}{x_i} + m_k^T\GLR{M_2}{x_i}M_3y + m_k^TM_2\GLR{M_3}{x_i}y \\ }$$