I want to compute the partial derivative of a real-valued function that takes matrices as argmuents.
The function has the form $$F(x,y,z) = ||g(x) \odot (S \cdot y) - z||,$$ where $x, y, z, S \in \mathbb{R}^{M \times N}$ and g is a function $\mathbb{R}^{M \times N} \mapsto \mathbb{R}^{M \times N}$.
So my question is: What is the partial derivative of $\nabla_x F$?
My try: 1. I defined a new function $$f(x,y,z) = ||x \odot (S \cdot y) - z||$$ so that $$F(x,y,z) = f(g(x),y,z).$$
To compute the derivative of F, I tried to compute the derivative of f and g and then use the chain rule. But I am not sure if the chain rule holds in this case.
I computed the derivative of f: $$ \nabla_x f(x,y,z) = \left[ ( \cdot ) \odot S_i(y) \right]^* \left( x \odot S_i (y) - z_{ij} \right) $$ $$ = \overline{S_i(y)} \odot \left( x \odot S_i (y) - z_{ij} \right) \\ $$ $$ =S_i(\overline{y} \odot y) \odot x - S_i (\overline{y}) \odot z_{ij}\\ $$
Then with the chain rule $$\nabla_{x_{rs}} F(x,y,z) = \nabla_x f(g(x),y,z) \cdot \nabla_{x_{rs}} g(x)$$ $$= \left( S_i(\overline{y} \odot y) \odot g(x) - S_i (\overline{y}) \odot z \right) \cdot \nabla{x_{r,s}}g(x)$$ But here I am unsure if the chain rule holds for matrices and if yes, if it should be a matrix multiplication or the Hadamard product to multiply the derivative of g with the derivative of f.
- Another idea would be to try to write the function $g(x) = M \cdot x$ for a matrix M and then compute the derivative of F directly. But I don't know if it is always possible to write a matrix-valued function, that takes a matrix as an argument as a matrix and how this would look in my case. The function $g(x)$ looks as follows: $$g(x) = \mathbb{F}^{-1}(\mathbb{F}(x) \odot Q), $$ where $x, Q$ are matrices and $\mathbb{F}$ is the 2D discrete Fourier transform. The 2D Fourier transform looks as follows: $$\mathbb{F}: \mathbb{C}^{M \times N} \to \mathbb{C}^{M \times N}, \, X \mapsto Y$$ with $$Y_{p,q} = \sum_{m = 0}^{M-1} \sum_{n=0}^{N-1} \exp \left( -2 \pi i \left(\frac{mp}{M} + \frac{n q}{N}\right) \right) \cdot X_{m,n}, \, $$for $$ p=0,...,M-1, q = 0,...,N-1.$$
Any help is appreciated! Thank you!
$ \def\R#1{{\mathbb R}^{#1}} \def\k{\otimes} \def\h{\odot} \def\o{{\tt1}} \def\bR#1{\Big(#1\Big)} \def\BR#1{\left[#1\right]} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\BR{\frac{#1}{#2}}} $I'll use a naming convention wherein Latin upper/lower case letters are matrices/vectors and Greek letters are scalars. This leads to renaming the problem variables $(F,x,y,z)\to(\varphi,X,Y,Z)$
Since the matrix $X$ has a rectangular shape, I'll assume that the function $g()$ is applied elementwise and that its derivative $g'()$ is known. The differential of such an elementwise matrix function is given by $$\eqalign{ G = g(X),\quad G' = g'(X) \qiq \c{dG = G'\h dX} \\ }$$ For typing convenience, introduce the matrix $$\eqalign{ W &= G\h SY - Z \\ }$$ and the extremely versatile $\sf Frobenius\;Product$ $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n \LR{A\h B}_{ij} \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \\ B:B &= \frob{B}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \;=\; \trace{A^TB} \\ \LR{PQ}:B &= P:\LR{BQ^T} \;=\; Q:\LR{P^TB} \\ \LR{A\h B}:C &= A:\LR{B\h C} \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\ }$$
Then square the objective function and calculate its gradient $$\eqalign{ \def\P{\frac{\o}{\varphi}\:} \varphi &= \frob{W} \\ \varphi^2 &= W:W \\ 2\varphi\:d\varphi &= 2W:dW \\ d\varphi &= \P W:dW \\ &= \P W:\LR{dG\h SY} \\ &= \P\LR{W\h SY}:\c{dG} \\ &= \P\LR{W\h SY}:\CLR{G'\h dX} \\ &= \P\LR{G'\h W\h SY}:dX \\ \grad{\varphi}{X} &= \P\LR{G'\h W\h SY} \\ }$$ This result can be rewritten in terms of the original variables $$\eqalign{ \grad Fx &= g'(x)\h\fracLR{g(x)\h Sy-z}F\h Sy \\ \\ }$$
Update
For part 2, instead of an elementwise function, a DFT is used $$\eqalign{ \def\F{{\mathbb F}} \def\Fi{\F^{-1}} G &= \Fi\bR{\,Q\h\F\LR{X}} \qiq \c{dG = \Fi\bR{\,Q\h\F\LR{dX}}} \\ }$$ Substituting this new $\c{dG}$ into the differential yields $$\eqalign{ \grad{\varphi}{X} &= \P\:\F\bR{\,Q\h\Fi\!\LR{W\h SY}} \\ }$$ The following DFT$-$Frobenius product rules are useful $$\eqalign{ \F\!\LR{A}:B &= A:\F\!\LR{B} \\ \Fi\!\LR{A}:B &= A:\Fi\!\LR{B} \\ }$$