I am trying to work out the derivative of a scaler ($J$) with respect to a vector ($x$) via chain rule. Below I laid out the steps in between, I find the matrix shape only checks out when I consider $x$ is a row vector. I labeled the details as below.
Red is the shape of a vector, and blue is the shape of the derivative matrix.
I think I still don't understand Jacobian, or derivative of a vector w.s.t. a vector well enough, please correct me.
Answer: the RHS of the second equation should be transposed since the left hand has been.


If you compare the RHS of your expressions they are identical, however, the LHS you have affected by transposition. When you transpose a vector $$ z=Ax $$ then the transpose is $$ z^T=(Ax)^T=x^TA^T. $$ Correspondingly, the RHS must change.
P.S. The first line is better for understanding and more natural, the second line is a burden from the vector analysis to have the gradient to be a vector instead of a row.