Gradient of a function w.r.t. matrices

78 Views Asked by At

Let $ \varphi : \mathbb{R}^m \times \mathbb{R}^{m \times n} \times \mathbb{R}^{n \times m} \to \mathbb{R} $ be defined by

$$ \varphi(x, A, B) = \left \| \left( x^T A B \right)^T - x \right \|^2 $$

What are the gradients of $\varphi$ with respect to $A$ and $B$, $\nabla_A \varphi(x, A, B) $ and $\nabla_B \varphi(x, A, B)$?


I have reached the followings. Are they correct?

$$ \nabla_A \varphi(x, A, B) = 2 ((x^TAB)^T - x)x^TB^T $$

$$ \nabla_B \varphi(x, A, B) = 2 ((x^TAB)^T - x)x^TA^T $$

2

There are 2 best solutions below

1
On BEST ANSWER

This is almost correct! Define $\mathbf{C}=\mathbf{A}\mathbf{B}$ and $\mathbf{u}=\mathbf{C}^T\mathbf{x} - \mathbf{x}$

$$ \phi = \left \| \mathbf{u} \right \|^2 $$ thus $ d\phi = 2 \mathbf{u} :d\mathbf{u} = 2 \mathbf{x}\mathbf{u}^T :d\mathbf{C} $. From here, you find $$ \frac{\partial \phi}{\partial \mathbf{A}} = 2 \mathbf{x} (\mathbf{Bu})^T , \frac{\partial \phi}{\partial \mathbf{B}} = 2 (\mathbf{A}^T\mathbf{x}) \mathbf{u}^T $$ which are both matrices of the correct dimensions.

3
On

An alternative way is to expand the difference $\varphi(x, A+H, B) - \varphi(x, A, B) $ and express it as the sum of a linear transformation on variable $H$ and a nonlinear transformation on variable $H$. This linear transformation is equal to the trace between $\nabla_{A}\varphi(x, A, B)$ and $H$. More precisely, $$ \varphi(x, A+H, B) - \varphi(x, A, B) = \text{trace}(\nabla_{A}\varphi(x, A, B)\cdot H^T) +\rho(H) \quad \mbox{ and } \quad \lim_{H\to 0}\dfrac{\rho(H)}{H}=0 $$

Note that \begin{align*} \varphi(x, A, B) &= \|x^T A B - x\|^2_F \\ &= \text{trace}\left((x^T A B - x)(x^T A B - x)^T\right) \\ &= \text{trace}\left(x^T A B B^T A^T x - x^T A B x - x B^T A^T x + xx^T\right) \\ &= \text{trace}\left(x^T A B B^T A^T x\right) - 2\text{trace}\left(x^T A B x\right) + \text{trace}\left(xx^T\right) \end{align*} After some algebraic manipulations you will find \begin{align*} \varphi(x, A+H, B) - \varphi(x, A, B) = & \text{trace}\left(x^T A B B^T H^T x\right) \\ +& \text{trace}\left(x^T H B B^T A^T x\right) \\&- 2\text{trace}\left(x^T H B x\right) \\ &\quad +\text{trace}\left(x^T H B B^T H^T x\right)\\ \end{align*}

Let $M_1, M_2, \ldots, M_k$ be a sequence of matrices that are compatible for multiplication, meaning the number of columns of matrix $M_\ell$ is equal to the number of rows of matrix $M_{\ell+1}$ for $\ell = 1, 2, \ldots, k-1$, and the number of columns of matrix $M_k$ is equal to the number of rows of matrix $M_1$. Then,

  • $\text{trace}(M_1 M_2 \ldots M_k) = \text{trace}(M_2 M_3 \ldots M_k M_1) = \cdots = \text{trace}(M_k M_1 M_2 \ldots M_{k-1})$,
  • $\text{trace}(M_\ell)=\text{trace}(M_\ell^T)$,
  • $(M_{1}\cdot M_{2})^{T}=M_{2}^{T}\cdot M_{1}^{T}$, $(M_{1}\cdot M_{2}\cdot M_{3})^{T}=M_{3}^{T}\cdot M_{2}^{T}\cdot M_{1}^{T}$ and $(M_1\cdot M_2\cdot \ldots \cdot M_k)^{T}= M_k^T\cdot M_{k-1}^T\cdot \ldots \cdot M_1^T $

Using these three properties above you will get $$ \nabla_{A}\varphi(x,A,B)= 2x(B(B^TA^Tx-x))^T $$