Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ and $f(x) = x^T A x$. I will denote $\nabla_x$ or $\nabla$ as the gradient to some vector-valued variable and $\nabla^2$ or $H$ as the Hessian.
The lecturer postulated that $\nabla f(x) = 2 A x$, and that $\nabla^2 f(x) = 2A$.
It's not immediately clear to me that this is true. What I thought of, was that $\nabla f(x)$ always yields a column vector, and that therefore $\nabla f(x) = 2 A x$. But this feels more like a trick (to remember it) and not like a proof to me.
How does one derive $\nabla_x x^T A x = 2 A x$? Why can't it be $2 (x^T A)^T = A^T x$?
You need $A$ to be symmetric for that.
$x^TAx=\langle Ax, x\rangle=:f(x).$
$f(x+h)=\langle A(x+h), x+h \rangle=\langle Ax, x \rangle + \langle Ax,h \rangle +\langle x,Ah \rangle+ \langle Ah,h \rangle.$
Therefore, the derivative is given by $f'_x=\langle Ax, \cdot \rangle +\langle x, A \cdot \rangle.$
Because $A$ is symmetric, (*) $$f'_x=\langle 2Ax, \cdot \rangle.$$
Since the definition of $\nabla_x f$ is the vector such that $f'_x= \langle \nabla_x f, \cdot \rangle$ (which exists and is unique by Riesz), we get $$\nabla_x f= 2Ax.$$
(*) Note that $f'_x=\langle Ax, \cdot \rangle +\langle x, A \cdot \rangle=\langle (A+A^T)x, \cdot\rangle .$ Therefore, $f'_x=\langle 2Ax, \cdot \rangle$ if and only if $2A=A+A^T$, which occurs if and only if $A$ is symmetric.