Gradient of a function with respect to a matrix

834 Views Asked by At

How can I compute the gradient of the following function with respect to $X$,

$$g(X) = \frac{1}{2}\|y-AX\|^2$$

where $X\in\mathbb{R}^{n\times n}$, $y\in\mathbb{R}^m$, and $A:\mathbb{R}^{n\times n}\to \mathbb{R}^m$ is linear. We can assume that $A$ is of the form,

$$A = \begin{pmatrix}\langle X| A_1\rangle\\\vdots\\\langle X|A_m\rangle\end{pmatrix}$$

where $A_1,\ldots,A_m$ are $n\times n$ real matrices and the inner product is the Frobenius inner product.

Edit: my attempt at finding the gradient,

$$g(X+H) = \frac{1}{2}\langle y-A(X+H), y-A(X+H)\rangle,\\ = \frac{1}{2} \langle y-AX-AH, y-AX-AH\rangle,\\ =\frac{1}{2} \left(\langle y-AX, y-AX\rangle -\langle y-AX,AH\rangle -\langle AH, y-AX\rangle +o(\|H\|)\right),\\ =g(X) - \langle y-AX, AH\rangle,\\ =g(X)-\langle A^*\left(y-AX\right),H\rangle,\\ \implies \nabla g(X) = -A^*\left(y-AX\right)$$

Now I must compute the adjoint operator $A^*$ of $A$.

To find $A^*$ we do the following,

$$\langle y, AX\rangle = \sum\limits_{i=1}^m y_i\langle X, A_i\rangle=\sum\limits_{i=1}^m \langle X, y_iA_i\rangle = \langle X, \sum\limits_{i=1}^my_iA_i\rangle$$

to see that $A^*y = \sum\limits_{i=1}^m y_iA_i$. Applying this to the expression we found above gives,

$$\nabla_Xg(X) = -A^*(y-AX) = -\sum\limits_{i=1}^m\left(y_i-\mbox{tr}(X^TA_i)\right)A_i.$$

3

There are 3 best solutions below

0
On BEST ANSWER

The derivative of $g(X)=1/2(y-A(X))^T(y-A(X))$ is

$Dg_X:Y\in M_n\rightarrow -(A(Y))^T(y-A(X))$.

$(A(Y))^T(y-A(X))=[tr(Y^TA_1),\cdots,tr(Y^TA_m)][y_1-tr(X^TA_1),\cdots,y_m-tr(X^TA_m)]^T=$

$\sum_{i\leq m}tr(Y^TA_i)(y_i-tr(X^TA_i))=tr(Y^T\sum_{i\leq m}(y_i-tr(X^TA_i))A_i)$.

Conclusion. The gradient of $g$ is

$\nabla(g)(X)=-\sum_{i\leq m}(y_i-tr(X^TA_i))A_i$.

0
On

$ \def\bbR#1{{\mathbb R}^{#1}} \def\a{\alpha}\def\b{\beta}\def\g{\gamma}\def\t{\theta} \def\l{\lambda}\def\s{\sigma}\def\e{\varepsilon} \def\n{\nabla}\def\o{{\tt1}}\def\p{\partial} \def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\vec#1{\operatorname{vec}\LR{#1}} \def\unvec#1{\operatorname{unvec}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\qif{\quad\iff\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{c}#1\end{array}\right]} $Reshape the matrix variables into their corresponding vectors $$\eqalign{ x &= \vec{X} &\qif X = \unvec{x} \\ a_k &= \vec{A_k} &\qif A_k = \unvec{a_k} \\ }$$ Then construct the matrix $$\eqalign{ B &= \m{a_1 & a_2& \ldots & a_m}^T \qiq A(X)=Bx \\ }$$ This results in a vector problem with a well known solution $$\eqalign{ \phi &= \frac 12\|Bx-y\|^2 \qiq \grad{\phi}{x}= B^T(Bx-y) \\ }$$ The only remaining task is to reshape this into a matrix-valued gradient $$\eqalign{ \grad{\phi}{X} = \unvec{\grad{\phi}{x}} \\ }$$

0
On

Write the cost function as $\phi = \frac{1}{2} \sum_i z_i^2 $ with $z_i= \mathrm{tr} \left( \mathbf{A}_i^T \mathbf{X} \right) -y_i $.

Then taking a differential approach, we obtain $$ d\phi = \sum_i z_i \mathrm{tr} \left( \mathbf{A}_i^T d\mathbf{X} \right) = \mathrm{tr} \left( \sum_i z_i \mathbf{A}_i^T d\mathbf{X} \right) $$ The gradient is thus $\sum_i z_i \mathbf{A}_i$.