I'm reading about the derivative of multivariable function:
First, the lecture note prove that the gradient of $f_{2}(x)=\frac{1}{2}\langle Q x, x\rangle \quad \forall x \in \mathbb{R}^{n}$ is $$\nabla f_{2}(x)=\frac{1}{2}\left(Q+Q^{\top}\right) x \quad \forall x \in \mathbb{R}^{n}$$
Then they use this result and the chain rule to compute the derivative of $f_3:\mathbb R^n \to \mathbb R, \quad x \to \|x\|$.
My questions:
Because $\nabla \|x\| = \nabla \langle x,x \rangle = 2x$. I could not understand why the lecture note gives $D \|x\|^2 = (2x)^{\top}$ and it concludes that $D f_3(x) = \frac{x^{\top}}{\|x\|}$ and $\nabla f_{3}(x)=\frac{x}{\|x\|}$. How does the transpose operator come into the play?
From my understanding, $D f_3 (x) \in \mathcal L(\mathbb R^n, \mathbb R)$. As such, should it be $D f_3 (x) = \langle \nabla f_{3}(x), \cdot\rangle$ rather than $\frac{x^{\top}}{\|x\|}$? Similarly, should it be $D \|x\|^2 = \langle 2x, \cdot\rangle$ rather than $2x^{\top}$?
I understand that $D f_3 (x)$ is a continuous linear map in $\mathcal L(\mathbb R^n,\mathbb R)$ such that $D f_3 (x) (\cdot) = \langle \nabla f_3(x), \cdot \rangle$. This kind of notation makes sense for me. As such, $Df_3(x)(v) = \langle \nabla f_3(x), v \rangle = \left \langle \dfrac{x^\top}{\|x\|}, v \right\rangle = \dfrac{x^\top}{\|x\|}\cdot v$. On the other hand, the way of writing $Df_3(x) = \dfrac{x^\top}{\|x\|}$ is very confusing to me. It suggests me that $Df_3(x) (v) = \dfrac{x^\top}{\|x\|}$ for any vector $v$.
It seems to me that the gradient and derivative are mixed up in this lecture note.
Could you please elaborate on my confusion? Thank you for your help!


