I'm trying to work out the partial derivatives of a function $L$ in terms of $x_i$:
$$ A \in \mathbb{R}^{m x n} \quad b \in \mathbb{R}^m \quad x \in \mathbb{R}^n $$
$$\begin{aligned} L(x) &= \left\|{Ax - b }\right\|^2 \\&= (Ax-b)^T(Ax-b) \\ &= x^TA^TAx - b^TAx - x^TA^Tb + b^Tb\end{aligned}$$
All four of these terms are scalars so I think I can transpose the third term to get:
$$\begin{aligned} &= x^TA^TAx - 2b^TAx + b^Tb\end{aligned}$$
I'm a little stuck on how to transform $x^TA^TAx$ further.
Calculating the partial derivatives, the final term $b^Tb$ is constant so goes to zero, the second term has coefficient vector $2b^TA$ so we just drop the $x$, its again the first term I'm stuck on:
$$\frac{\partial{L(x)}}{\partial x} = ??? -2b^TA$$
What's the derivative of $x^TA^TAx$ in terms of $x$? How you work it out?
Update:
After looking at the potential duplicate, I think I'm mostly covered - however it's not immediately obvious to me why:
$$\frac {\partial(x^TMx)} {\partial x}=(M+M^T)x$$
How is this rule derived?
I answer to your last question for the term $f(x)=x^\top A^\top A x$.
You can easily find it in the following way:
$$f(x+dx)= ( x + dx)^\top A^\top A (x+dx) = x^\top A^\top A x + 2x^\top A^\top A dx + dx^\top A^\top A dx\\= f(x) + 2x^\top A^\top A dx + O(dx^2) $$
As you can see, the linear term in $dx$ is your gradient (or better, the transpose of it, since what you see there is gradient transpose).
So $$\frac{\partial (x^\top A^\top A x)}{\partial x}= 2 A^\top A x$$