I trying to find the gradients of the following matrix function $$F(X,A,B) = X^T(XA+B),$$ where $X,A,B$ are some matrices. $$\nabla_A F = X^TX \quad ?$$ $$\nabla_B F = X^T \quad ?$$ $$\nabla_X F = 2XA+B \quad ?$$
Is this right? I doubt because of $X^T$. Would be greatful for clarification.
According to Greg's answer. For example: $$A=\begin{pmatrix} 1 & 0.5 \\ 2 & 0.1 \end{pmatrix}$$ $$X=\begin{pmatrix} 1 & 2 \\ 2 & 0 \end{pmatrix}$$ $$X^TX=\begin{pmatrix} 5 & 2 \\ 2 & 4 \end{pmatrix}$$ $$\dfrac{\partial (X^TXA)}{\partial A_{11}} = X^TX E_{11}= X^TX \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} 5 & 0 \\ 2 & 0 \end{pmatrix}$$ So it turns out that the output is also a matrix, not a number.
ADDITIONAL QUESTION
Suppose we have $F(X,A) = X^T X A$, where $X,A$ are some matrices (not vectors), and we want to calculate: $$F(X_0,A_0) + J(X_0,A_0) \begin{bmatrix} X-X_0 \\ A-A_0 \end{bmatrix} = \; ? $$ $$J(X_0,A_0) \begin{bmatrix} X-X_0 \\ A-A_0 \end{bmatrix} = \begin{bmatrix} \dfrac{\partial F(X_0,A_0)}{\partial X_0} & \dfrac{\partial F(X_0,A_0)}{\partial A_0} \end{bmatrix} \begin{bmatrix} X-X_0 \\ A-A_0 \end{bmatrix} =$$ $$=\begin{bmatrix} T_1 & T_2 \end{bmatrix} \begin{bmatrix} X-X_0 \\ A-A_0 \end{bmatrix}= T_1 \cdot (X-X_0) + T_2 \cdot (A-A_0)$$ where $T_1,T_2$ are tensors.
- How to compute this product $T_1 \cdot (X-X_0)$?
- Will the result be a matrix?
$ \def\A{A_{ij}}\def\B{B_{ij}}\def\X{X_{ij}} \def\a{\alpha}\def\f{\phi}\def\o{{\tt1}} \def\E{E_{ij}} \def\e{\varepsilon_{k}}\def\x{x_{k}}\def\b{b_{k}} \def\L{\left}\def\R{\right}\def\LR#1{\L(#1\R)} \def\p{\partial}\def\grad#1#2{\frac{\p #1}{\p #2}} $The gradient of a matrix with respect to another matrix is a fourth-order tensor, so none of the listed results can be correct since all of them are matrices and not tensors.
Perhaps the simplest approach is to calculate matrix-valued component-wise gradients $$\eqalign{ F &= X^TXA +X^TB \\ \grad{F}{\A} &= X^TX\E \\ \grad{F}{\B} &= X^T\E \\ \grad{F}{\X} &= \E^TXA + X^T\E A + \E^TB \\ }$$ where $\E$ is the single-entry matrix whose elements are all $0$ except for a $\tt1$ in the $(i,j)$ position.
Simply substitute each occurence of the independent variable with a single-entry matrix in the gradient.
The $\E$ also act as the standard matrix basis, so the full gradient tensor can be constructed by forming elementwise dyadic products $(\star)$ and summing over both indices, e.g. $$\eqalign{ \nabla_AF = \LR{\grad{F}{A}} &= \sum_{i=1}^m\sum_{j=1}^n \LR{X^TX\E}\star\E \\ }$$ This is analogous to the way that a matrix can be constructed by summing over the standard basis vectors $(\e)$ and the columns $(a_k)$ of the matrix $$\eqalign{ A &= \sum_{k=1}^n \;a_k\star\e \\ }$$
Update
If we replace the matrices $(B,X)$ by vectors $(b,x),\,$ then the basis matrix $\E$ in the gradient gets replaced by the basis vector $\e$
Similarly, replacing the matrices $(A,F)$ by scalars $(\a,\f),\,$ requires the scalar basis (i.e. the number $\o$) in the gradient formula. $$\eqalign{ \f &= x^Tx\a +x^Tb \\ \grad{\f}{\a} &= x^Tx\o = \|x\|^2 \\ \grad{\f}{\b} &= x^T\e = \x \\ \grad{\f}{\x} &= \e^Tx\a + x^T\e \a + \e^Tb = 2\a\x + \b \\ }$$ Summing the last two results over the $\e$ basis yields $$\eqalign{ \grad{\f}{b} &= \sum_{k=1}^n \x\star\e \;=\; x \\ \grad{\f}{x} &= \sum_{k=1}^n \LR{2\a\x\star\e + \b\star\e} \;=\; 2\a x+b \\ }$$