Derivative of $Y \mapsto Y^T Y$

154 Views Asked by At

Is this derivative done correctly? I've did not find the solution in the matrix cookbook, but followed the similar examples:

$$\frac{\delta Y^TY}{\delta Y} = ?$$

$$X=Y^TY$$

$$\delta X=(\delta Y^T)Y + Y^T(\delta Y)$$

$$\mathrm{vec}(\delta X)=\mathrm{vec}(\delta Y^TY) + \mathrm{vec}(Y^T\delta Y)$$

$$\mathrm{vec}(\delta X)=(I\otimes Y)\,\mathrm{vec}(\delta Y^T) + (I \otimes Y^T)\,\mathrm{vec}(\delta Y)$$

$$\frac{\delta X}{\delta Y} = 2(I \otimes Y^T)$$

2

There are 2 best solutions below

2
On BEST ANSWER

What you have computed is really $\frac{\delta \operatorname{vec}(Y^TY)}{\delta \operatorname{vec}(Y)}$; I will assume this is what you're really after. I will also assume that you are using the column-major vectorization operator. To correct your mistake, we have the following: $$ \begin{align} \delta\operatorname{vec}(Y^TY) &=\operatorname{vec}(\delta Y^TY) + \operatorname{vec}(Y^T\delta Y) \\ &= (Y^T \otimes I)\operatorname{vec}(\delta Y^T) + (I \otimes Y^T)\operatorname{vec}(\delta Y) \\ &= (Y^T \otimes I)K\operatorname{vec}(\delta Y) + (I \otimes Y^T )\operatorname{vec}(\delta Y) \\ &= [(Y^T \otimes I)K + (I \otimes Y^T)]\operatorname{vec}(\delta Y), \end{align} $$ where $K$ is the commutation matrix of the correct size. With that, we find that $$ \frac{\delta \operatorname{vec}(Y^TY)}{\delta \operatorname{vec}(Y)}= (Y^T \otimes I)K + (I \otimes Y^T). $$

0
On

$ \def\a{\alpha}\def\b{\beta} \def\o{{\tt1}}\def\p{\partial} \def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\vec#1{\operatorname{vec}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $If you absolutely need the tensor-valued gradient, then you have several options.

Perhaps the simplest approach is to take Ben Grossmann's matrix-valued gradient $$\eqalign{ G_{\a\b} &= \grad{x_\a}{y_\b} \quad\iff\quad G &= \grad{x}{y} &= \grad{\vec{X}}{\vec{Y}} \\ }$$ and reverse the Kronecker-vec indexing $$\eqalign{ x &\in {\mathbb R}^{n^2\times\o} \implies X \in {\mathbb R}^{n\times n} \\ x_{\a} &= X_{ij} \\ \a &= i+(j-1)\,n \\ i &= \o+(\a-1)\,{\rm mod}\,n \\ j &= \o+(\a-1)\,{\rm div}\,n \\ \\ y &\in {\mathbb R}^{mn\times\o} \implies Y \in {\mathbb R}^{m\times n} \\ y_{\b} &= Y_{k\ell} \\ \b &= k+(\ell-1)\,m \\ k &= \o+(\b-1)\,{\rm mod}\,m \\ \ell &= \o+(\b-1)\,{\rm div}\,m \\ }$$ to recover the tensor-valued gradient $$\eqalign{ G &\in {\mathbb R}^{n^2\times mn} \implies \Gamma\in {\mathbb R}^{n\times n\times m\times n} \\ G_{\a\b} &= \Gamma_{ijk\ell} = \grad{X_{ij}}{Y_{k\ell}} \\ }$$