I am learning matrix calculus and I would like to understand how the derivative of the following function: $$ \mathit{f}(\mathbf{x}) = \mathbf{x}^T \mathbf{Ax} $$ is calculated.
I am able to derive the differentials up to this point: $$ \mathrm{d}\mathit{f} = \mathrm{d}\mathbf{x}^T \mathbf{A} \mathbf{x} + \mathbf{x}^T \mathbf{A} \mathrm{d}\mathbf{x} $$ In my book it is further simplified to: $$ \mathrm{d}\mathit{f} = (\mathbf{A} + \mathbf{A}^T)\mathbf{x}\mathrm{d}\mathbf{x} $$ According to what rules have they simplified this? How did they decide on what to transpose and in what order? I guess there is a reasoning behind this other than "I want to have $\mathrm{d}\mathbf{x}$ on the right side and the dimensions must match".
Let $f=x^TAx$. We can write $f$ as
$$f=\sum_{i}\sum_j x_iA_{ij}x_j$$
Then, from the rule we have
$$df=\sum_{i}\sum_j \left(dx_iA_{ij}x_j+x_iA_{ij}dx_j\right)=(dx)^TAx+x^TA(dx)$$
We note also that
$$\begin{align} (dx)^TAx+x^TA(dx)&=((dx)^TAx)^T+x^TA(dx)\\\\ &=x^TA^Tdx+x^TA(dx)\\\\ &=x^T(A^T+A)dx\\\\ &=((A^T+A)^T(x^T)^T)dx\\\\ &=(A+A^T)xdx \end{align}$$
as was to be shown!