Derivative of function with respect to matrix

79 Views Asked by At

Problem

I would like to compute the derivatives with respect to $U$ and $V$ of this function $$ f(U, V) = \|Y - XUV^\top\|_F^2 + \lambda\|UV^\top\|_F^2. $$

Notation

Let $Y\in\mathbb{R}^{n\times q}$, $X\in\mathbb{R}^{n\times p}$, $U\in\mathbb{R}^{p\times d}$, $V\in\mathbb{R}^{q\times d}$ and $\lambda > 0$.

Attempted Solution for $U$

Using the matrix cookbook I get for the first term $$ \begin{align} \nabla_U \|Y - XUV^\top||_F^2 &= \nabla_U \text{Tr}((Y - XUV^\top)^\top(Y-XUV^\top)) \\ &= \text{Tr}(Y^\top Y -Y^\top XUV^\top - VU^\top X^\top Y - V U^\top X^\top X U V^\top) \\ &= -2X^\top Y V + \nabla_U\text{Tr}(VU^\top X^\top X UV^\top) \\ &= -2X^\top Y V + 2 X^\top X U V^\top V \end{align} $$ Using equations 101, 102 and 116 of the Matrix Cookbook. Using again equation 116 for the second term we have $$ \nabla_U \lambda \|UV^\top\|_F^2 = \lambda \nabla_U \text{Tr}(VU^\top U V^\top) = 2\lambda U V^\top V $$ so overall we have $$ \nabla_U f(U, V) = -2X^\top Y V + 2 X^\top X U V^\top V + 2\lambda U V^\top V $$

Attempted Solution for V

2

There are 2 best solutions below

1
On BEST ANSWER

A useful computer algebra tool for these kinds of problems is www.matrixcalculus.org.

Simply enter norm2(Y-X*U*V')^2 + c*norm2(U*V')^2 and you get:

$$\begin{aligned} \text{(I)}&&\frac{\partial}{\partial U} \left( \|Y-X U V^\top \|_2^{2}+c \|U V^\top \|_2^{2} \right) &= 2 c U V^\top V-2 X^\top \cdot (Y-X U V^\top ) V \\ \text{(II)}&& \frac{\partial}{\partial V} \left( \|Y-X U V^\top \|_2^{2}+c \|U V^\top \|_2^{2} \right) &= 2 c V U^\top U-2 (Y^\top -V U^\top X^\top ) X U \end{aligned}$$


Now, if you want to derive this stuff by hand, two simple rules you can remember are:

  1. $AXB^⊤ = (A⊗B)⋅X$. In particular, $\frac{∂ AXB^⊤}{∂X} = A⊗B$
    • Note1: $A⊗B≔(A_{ik}B_{jl})_{ij, kl}$ is a rank-4 tensor, and $(A⊗B)⋅X$ denotes the tensor-contraction $∑_{kl}(A⊗B)_{ij, kl} X_{kl}$ (convince yourself that this is the same as $AXB^⊤$!)
    • Note2: matrixcalculus.org follows the different convention $AXB^⊤ = (B⊗A)⋅X$, the results are the same anyway!
  2. $\frac{∂ \frac{1}{2}\|⋅X -Y\|^2}{∂X} = ^⊤(⋅X-Y)$. In particular:

$$\begin{aligned} \frac{∂ \frac{1}{2}\|AXB^⊤\|^2}{∂X} &= \frac{∂ \frac{1}{2}\|(A⊗B)⋅X\|^2}{∂X} \\&= (A⊗B)^⊤(A⊗B)⋅X \\&= (A^⊤⊗B^⊤)(A⊗B)⋅X \\&= (A^⊤A⊗B^⊤B)⋅X \\&= A^⊤A X B^⊤B \end{aligned}$$

So for instance we get

$$\begin{aligned} \frac{∂½\|Y-X U V^⊤ \|^2}{∂V} &=\frac{∂½\|Y-X U V^⊤ \|^2}{∂V^\top}\frac{∂V^⊤}{∂V} \\&=\frac{∂½\|(X U⊗)⋅ V^⊤\|^2}{∂V^\top}\frac{∂V^⊤}{∂V} \\&=(X U⊗)^⊤((X U⊗)⋅V^⊤-Y)⋅ \\&=\big((U^⊤X^⊤XU⊗)V^⊤ - (U^⊤X^⊤⊗)Y\big)⋅ \\&=(U^⊤X^⊤XUV -U^⊤X^⊤Y) ⋅ \\&=VU^⊤X^⊤XU - Y^⊤ XU \end{aligned}$$

Which you will recognize as the second term in (II). Here, $$ is the so-called transpose tensor which satisfies $⋅X = X^⊤$.

0
On

$ \def\l{\lambda}\def\o{{\tt1}}\def\p{\partial} \def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\g#1#2{\frac{\p #1}{\p #2}} $Introduce the matrix variable $$Z=UV^T\qiq dZ=dU\,V^T + U\,dV^T$$ and the matrix inner product $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ Rewrite the function in terms of these and calculate its differential. $$\eqalign{ f &= \LR{XZ-Y}:\LR{XZ-Y} + \l\LR{Z:Z} \\ df &= 2\LR{XZ-Y}:\LR{X\,dZ} + 2\l\LR{Z:dZ} \\ &= \BR{2X^T\LR{XZ-Y} + 2\l Z}:dZ \\ &= G:dZ \\ }$$ where $G$ is the gradient with respect to $Z$.

If $V$ is constant, then $\,dV=0\;$ and $$\eqalign{ dZ &= dU\,V^T \qiq df = G:dU\,V^T = GV:dU \qiq \g{f}{U} = GV \\ }$$ while if $U$ is constant $$\eqalign{ dZ &= U\,dV^T \qiq df = G:U\,dV^T = G^TU:dV \qiq \g{f}{V} = G^TU \\\\ }$$


NB: $\,$ The properties of the underlying trace function allow terms in the matrix inner product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\ }$$