How can I compute the gradient of the following function with respect to $X$,
$$g(X) = \frac{1}{2}\|y-AX\|^2$$
where $X\in\mathbb{R}^{n\times n}$, $y\in\mathbb{R}^m$, and $A:\mathbb{R}^{n\times n}\to \mathbb{R}^m$ is linear. We can assume that $A$ is of the form,
$$A = \begin{pmatrix}\langle X| A_1\rangle\\\vdots\\\langle X|A_m\rangle\end{pmatrix}$$
where $A_1,\ldots,A_m$ are $n\times n$ real matrices and the inner product is the Frobenius inner product.
Edit: my attempt at finding the gradient,
$$g(X+H) = \frac{1}{2}\langle y-A(X+H), y-A(X+H)\rangle,\\ = \frac{1}{2} \langle y-AX-AH, y-AX-AH\rangle,\\ =\frac{1}{2} \left(\langle y-AX, y-AX\rangle -\langle y-AX,AH\rangle -\langle AH, y-AX\rangle +o(\|H\|)\right),\\ =g(X) - \langle y-AX, AH\rangle,\\ =g(X)-\langle A^*\left(y-AX\right),H\rangle,\\ \implies \nabla g(X) = -A^*\left(y-AX\right)$$
Now I must compute the adjoint operator $A^*$ of $A$.
To find $A^*$ we do the following,
$$\langle y, AX\rangle = \sum\limits_{i=1}^m y_i\langle X, A_i\rangle=\sum\limits_{i=1}^m \langle X, y_iA_i\rangle = \langle X, \sum\limits_{i=1}^my_iA_i\rangle$$
to see that $A^*y = \sum\limits_{i=1}^m y_iA_i$. Applying this to the expression we found above gives,
$$\nabla_Xg(X) = -A^*(y-AX) = -\sum\limits_{i=1}^m\left(y_i-\mbox{tr}(X^TA_i)\right)A_i.$$
The derivative of $g(X)=1/2(y-A(X))^T(y-A(X))$ is
$Dg_X:Y\in M_n\rightarrow -(A(Y))^T(y-A(X))$.
$(A(Y))^T(y-A(X))=[tr(Y^TA_1),\cdots,tr(Y^TA_m)][y_1-tr(X^TA_1),\cdots,y_m-tr(X^TA_m)]^T=$
$\sum_{i\leq m}tr(Y^TA_i)(y_i-tr(X^TA_i))=tr(Y^T\sum_{i\leq m}(y_i-tr(X^TA_i))A_i)$.
Conclusion. The gradient of $g$ is
$\nabla(g)(X)=-\sum_{i\leq m}(y_i-tr(X^TA_i))A_i$.