I want to derive the function $S(x) = ||a-x||^2$ given that $a, x \in \mathbb R^{1 \times n}$ are $n$ dimensional row vectors. The norm is the simple euclidean norm.
We also know that $x = hW+b$ where $h,b \in \mathbb R^{1 \times n}$, $W \in \mathbb R^{n \times n}$.
Specifically, I want to find $\frac{\partial S}{\partial W}$.
From chain rule we have that $\frac{\partial S}{\partial W} = \frac{\partial S}{\partial x}\frac{\partial x}{\partial W}$.
If I am not mistaken, $\frac{\partial S}{\partial x}$ is simply $-2(a-x) \in \mathbb R^{1 \times n}$
So overall, we have $\frac{\partial S}{\partial W} = -2(a-x)\frac{\partial x}{\partial W}$. And herein lies the problem.
$W$ is $n$ by $n$. And so $\frac{\partial S}{\partial W}$ should also be $n$ by $n$. but $\frac{\partial S}{\partial W} = -2(a-x)\frac{\partial x}{\partial W}$ and $-2(a-x) \in \mathbb {1 \times n}$
There is no way we can multiply a $1$ by $n$ vector by something on the right side, and get an $n$ by $n$ matrix. Where is my mistake here?
According to https://en.wikipedia.org/wiki/Matrix_calculus#Identities: "The chain rule applies in some of the cases, but unfortunately does not apply in matrix-by-scalar derivatives or scalar-by-matrix derivatives (in the latter case, mostly involving the trace operator applied to matrices)".
If you scroll down this section, you will see a formula for your type of problem involving tranpose and trace. I think it'll work out if you use transpose of $-2(a-x).$