This issue I just cannnot resolve, so I'd highly appreciate your help. Let $a_1, ... , a_n \in \mathbb{R}^k$ with $k$ a natural positive number. If we consider the function $$ W: \mathbb{R}^k \to \mathbb{R}, x \to \sum_{i=1}^n {(x-a_i)}\cdot {(x-a_i)}$$ ($\cdot$ being the standard inner product) then we can write $ W = \sum_{i=1}^n N \circ T_{a_i}$, where $$N(x) := x \cdot x, T_{a_i}(x) := x - a_i$$ giving $$D_x N(h) = 2 (x \cdot h), D_x T_{a_i}(h) = h$$ With the linearity of the total differential and the chain rule ($D_p (f\circ g) = D_{g(p)}f \circ D_p g$) we obtain $$D_xW(h) = \sum_{i=1}^n (D_x(N\circ T_{a_i}))(h) = \sum_{i=1}^n (D_{T_{a_i}(x)} N \circ D_x T_{a_i}) (h) = \sum_{i=1}^n D_{T_{a_i}(x)} N(h) = \sum_{i=1}^n 2 (x-a_i) \cdot h$$
Now you can supposedly take the second derivative: $$D(D_xW(h))(g) = \sum_{i=1}^n 2D_x [(T_{a_i} (g)) \cdot h] = 2 \sum_{i=1}^n g \cdot h$$
I don't understand 1. how the first step (equality) above follows and 2. why this simply doesn't equal zero, since for a function $f:\mathbb{R}^k \to \mathbb{R}$ evaluating the total derivative $D_pf$ just gives $D_pf(x) \in \mathbb{R}$ for $p, x \in \mathbb{R}^k$, so that $D(D_xW(h))(g)= (D_xW(h))'(g) = 0$.
Or if there is a better alternative to calculating the second derivative, please share! (For context this came up showing that $W$ has a minimum in $x_0 := \frac 1 n \sum_{i=1}^n a_i$ - supposedly $W$ arises naturally in the context of the method of least-squares.)
I hope everything is clear and again, I appreciate your efforts.
Edit: I hope this warrants the "functional analysis"/"Frechét derivative" tag, as (if I understand correctly from the wikipedia article) they are concerned with infinite dimensional vector spaces, but in the finite dimensional case like the one at hand it just seems to be the derivative that I've learned.
Let me denote $x$ the column vector $(x_1,\ldots,x_k)$ and $a = (a_1,\ldots,a_k)$.
It might provide clarity to keep track of the domain and range of the differentials.
For example, the first derivative at any point $x$ is a linear map $D_xW: \mathbb{R}^k \rightarrow \mathbb{R}$ given by the formula you posted, $h\mapsto 2(x-a)^T h$.
Then if we think of $DW$ as being a map of $x$, it is: $$ D_{(\cdot)}W: \mathbb{R^k} \rightarrow Hom(\mathbb{R}^k,\mathbb{R}) $$ where the last space can just be thought of as $\mathbb{R}^k$. So actually what you called $f$, should be a map from $\mathbb{R}^k\rightarrow \mathbb{R}^k$. It is given by: $$ f(x) = 2(x-a) $$
Now the derivative of $f$ should be a $k\times k$ matrix, and it turns out to just be 2I: $$ D_xf = 2I $$ so that should take you straight from the first term to the third term in your last line -- however you should be clear what you mean by "$D$" and where you are taking it.