If you have a function of the form
$$ f(x) = \frac{1}{2}\left\lVert Ax - y \right\rVert_2^2 $$
We know that if $A \in \mathbb{R}^{n \times m}, x \in \mathbb{R}^m$ and $y \in \mathbb{R}^n$ we can find the minimizer by differentiating
$$ \nabla_x f = A^TAx - A^Ty $$
setting this to $0$ leads to the linear system
$$ A^TAx = A^Ty. $$
Suppose now instead of $x$ being unknown the unknown is $A$. In this case we have
$$ f(A) = \frac{1}{2} \left\lVert Ax - y \right\rVert_2^2 $$
To calculate the gradient w.r.t. $A$ I proceed as follows (assuming as a norm for $A$ I am using the 2 norm).
$$ \lim_{E \to 0} \frac{\left| f(A + E) - f(A) - T(A)E \right|}{\left\lVert E \right\rVert_2} $$
With a little bit of calculation we can show that $$ f(A + E) - f(A) = \left(x^TA^T - y^T\right)Ex +\left\lVert Ex \right\rVert_2^2 $$
Substituing this into the limit and using the squeeze theorem I get
$$ 0 \leq \lim_{E \to 0} \frac{\left| \left(x^TA^T - y^T\right)Ex +\left\lVert Ex \right\rVert_2^2 - T(A)E \right|}{\left\lVert E \right\rVert_2} \leq \lim_{E \to 0} \frac{\left| \left(x^TA^T - y^T\right)Ex - T(A)E \right| +\left\lVert Ex \right\rVert_2^2 }{\left\lVert E \right\rVert_2} \leq \lim_{E \to 0} \frac{\left\lVert \left(x^TA^T - y^T\right)(\cdot)x - T(A) \right\rVert_{{\mathbb{R}^{n \times m}}^*} \left\lVert E \right\rVert_2 +\left\lVert Ex \right\rVert_2^2 }{\left\lVert E \right\rVert_2} = \lim_{E \to 0} \left\lVert \left(x^TA^T - y^T\right)(\cdot)x - T(A) \right\rVert_{{\mathbb{R}^{n \times m}}^*} $$
The last limit is equal to 0 iff $$ T(A) = \left(x^TA^T - y^T\right)(\cdot)x $$
Question 1: Is my calculation of the differential correct? Assuming it is I was trying to characterize $T(A)$ using a basis $E_{ij} = \delta_{ij}$ by doing this I get
$$ T(A)E_{ij} = \left(x^TA^T - y^T\right)E_{ij}x = \left(x^TA^T - y^T\right)x_j e_i = x_j \left(x^TA^T - y^T\right) e_i = x_j \left(x^TA^T e_i - y^T e_i \right) = x_j \left(x^TA^T e_i - y_i \right) $$
Question 2: Is this correct? This should give me a set of equations that I should be able to solve for $A$?