I am trying to take the derivative of $(Ax - b)^T(Ax-b)$ and setting it to zero without expanding the multiplication, by only using matrix calculus. I knew the partial derivative of $x^Tx$ according to $x$ is $2x$, derivative of $Ax - b$ is $A^T$ and by utilizing the chain rule, $(f o g)' = (f'(g))(g')$ I obtained,
$$ 2(Ax-b)A^T = 0 $$
or
$$ (2Ax-2b)A^T = 0 $$
However the dimensions are not lining up here, if I had
$$ 2A^T(Ax-b) = 0 $$
it would be fine. How can I get this result? Is there a rule I skipped that allows the statement above to be the derivative?
Note:
From this link I found out
$$ \frac{\partial}{\partial t} f(g(t)) = \nabla f(g(t))^T \frac{\partial g}{\partial t} $$
If I had $g(x) = Ax-b$ and $f(z) = z^Tz$ then I could have
$$ \frac{\partial}{\partial x} f(g(x)) = 2z^TA = 2(Ax-b)^TA = 0 $$
Take transpose of both sides
$$ 2A^T(Ax-b) = 0 $$
So I guess my original chain rule was wrong. Does this look correct? If yes, does anyone know where I can find the derivation of the correct Chain Rule?
No. With vectors and matrices the order of multiplication matters. One way of seeing what's going on is to write
$$Ax = \left[ \begin{array}{ccc} a_1 & \cdots & a_n \end{array} \right] \left[ \begin{array}{c} x_1 \\ \vdots \\ x_n \end{array} \right]$$,
such that $\partial Ax / \partial x_1 = a_1$. Thus,
$$\frac{\partial (Ax-b)}{\partial x'} = A.$$
The result you're looking for is therefore
$2 A' (Ax-b)$.