I'm working on the derivation of a gradient based algorithm for computing tensor decompositions. In order to do this work I have to delve into matrix and vector calculus and looking at different sources, papers and works has left me rather confused. More specifically, I seem to run into different representations for the same gradients regarding derivations to vectors/matrices. In this context I was wondering if a lot of matrix/vector calculus is merely based on convention or is there something else going on?
For example:
In these notes on page 13 it is mentioned that $\dfrac{dAx}{dx} = A$ and $\dfrac{dx^TAx}{dx} = x^T(A + A^T)$ while the matrix cookbook and these notes indicate that $\dfrac{dx^TA}{dx} = A$ (and thus $\dfrac{dAx}{dx} = A^T$?) and $\dfrac{dx^TAx}{dx} = 2Ax$ or even $\dfrac{dx^TAx}{dx} = (A + A^T)x$ which is again different.
Question: Are there any real differences/motivations between these different representations? Or are they more or less a convention?