Derivative $\|\operatorname{diag}(X^T A X) - y\|^2_2$ with respect to $A$?

644 Views Asked by At

How to take the derivative of $\|\operatorname{diag}(X^T A X) - y\|^2_2$ with respect to $A$,

where $X \in \mathbb{R}^{n \times m}$, $A \in \mathbb{R}^{n \times n}$, $y \in \mathbb{R}^m$.

1

There are 1 best solutions below

0
On BEST ANSWER

For typing convenience, define the vector $$w={\rm diag}(X^TAX)-y$$ Write the function in terms of this new variable, then find its differential and gradient. $$\eqalign{ \phi &= w:w \cr d\phi &= 2w:dw \cr &= 2w:{\rm diag}(X^T\,dA\,X) \cr &= 2\,{\rm Diag}(w):(X^T\,dA\,X) \cr &= 2X\,{\rm Diag}(w)\,X^T:dA \cr \frac{\partial\phi}{\partial A} &= 2X\,{\rm Diag}(w)\,X^T \cr }$$ where $\,\odot$ represents the elementwise/Hadamard product
while a $\,:$ represents the trace/Frobenius product, i.e. $\,\,\,A:B={\rm tr}(A^TB)$

The function $\,{\rm Diag}(a)\,$ creates a diagonal matrix from the input vector, while $\,{\rm diag}(A)\,$ does the opposite, i.e. it creates a vector from the diagonal of the input matrix.