A couple of related questions:
Suppose we want to calculate the gradient $\nabla_{\eta} (\exp{\{\eta^{T}{\bf{u(x)}}\}})$ (as Muphrid suggested, $\nabla_{\eta}$ means the gradient with respect to the variable $\eta$.) Obviously, we would have something like this
$$\nabla_{\eta} (\exp{\{\eta^{T}{\bf{u(x)}}\}}) = \exp{\{\eta^{T}{\bf{u(x)}}\}}{\bf{u(x)}}$$ where $\eta$ and $u(x)$ are vectors. In the equation above, what operation is implied by the gradient? Inspecting the equation, I guess it is straightforward to assume that since $\eta^{T}{\bf{u(x)}}$ produces a scalar and gradients produce vectors (or higher dimensional quantities), then this operation is unambiguous. But what about the following:
$$\nabla_{\eta} \left(\exp{\{\eta^{T}{\bf{u(x)}}\}}{\bf{u(x)}}\right)=\exp{\{\eta^{T}{\bf{u(x)}}\}}{\bf{u(x)}}{\bf{u(x)}}$$
Applying the same logic as before, ${\bf{u(x)}}{\bf{u(x)}}$ should be a vector or something else, but definitely not a scalar. So the simplest choice is to assume that is a matrix. Therefore, ${\bf{u(x)}}{\bf{u(x)}}$ means ${\bf{u(x)}} {\bf{u(x)}}^{T}$ which as far as I know is correct (and makes sense, a gradient of a vector is a matrix)
But what is a principled way of dealing with this kind of operations?
UPDATE:
I have just seen another example that is a bit more confusing than the previous ones. What should be the result of $\nabla \left(\frac{1}{2}{\bf{w}}^{T}{\bf{w}}\right)$? ${\bf{w}}^{T}$ or $\bf{w}$? According to the answer of Brady Trainor, it should be a contravariant tensor of rank 1 which means the resulting vector is a column vector. Is that right? The minimization of the equation
$$\frac{1}{2}\sum_{n=1}^{N}\{t_{n} - {\bf{w}}^{T}\phi(x_{n})\}^{2}+\frac{\lambda}{2}{\bf{w}}^{T}{\bf{w}}$$
resulting in
$${\bf{w}} = (\lambda I + \Phi^{T}\Phi)^{-1}\Phi^{T}{\bf{t}}$$
leads to me to believe that the correct answer is a row vector ${\bf{w}^{T}}$
Thanks in advance
Tensor products can be hard to interpret sometimes. There is a related notion call the geometric product of vectors. The geometric product of $a$ and $b$ is denoted $ab$ and equal to $a \cdot b + a \wedge b$, where $a \wedge b$ is a member of the exterior algebra, called a bivector, and interpreted as an oriented plane. Bivectors can be represented with skew-symmetric matrices, while the parts of the dot product can be put on the diagonal. This is a way of getting a matrix out of the geometric product, but for the most part, actually using those matrices is unnecessary.
So we're getting a little ways away from matrix calculus, but we're all covering the same ground.
Let's talk about differentiation. The vector derivative with respect to a vector $\eta$ is called $\nabla_\eta = e^1 \frac{\partial}{\partial \eta^1} + \ldots$. Strictly speaking, $\nabla$ is a covector (a row vector, a cotangent vector, etc) and you have to account for this accordingly. The vector derivative can be combined with vectors and such using the geometric product also, combining divergence and curl into one operation.
Now then, let's look at your first problem. This is made easier using the chain rule. Let $a, b$ be a vector and $f, g$ be vector fields. This is the chain rule:
$$b \cdot \nabla_a (f \circ g)(a) = [b \cdot \nabla_a g(a)] \cdot \nabla_g f(g)$$
We can apply this to our problem at hand.
$$b \cdot \nabla_\eta (u \exp [u \cdot \eta]) =u (b \cdot \nabla_\eta [u \cdot \eta]) \frac{d}{d(u \cdot \eta)} \exp (u \cdot \eta)$$
The result is $u (b \cdot u) \exp (u \cdot \eta)$. Differentiating with $\nabla_b$ gives us the originally desired formula, yielding $(u \cdot u) \exp(\eta \cdot u)$. This is a scalar--the curl parts are identically zero.
Let's look at your second problem. $\frac{1}{2} \nabla_w (w \cdot w)$. We can attack this by symmetry and holding one of the $w$ constant, as $\nabla_w (w \cdot a) = a$ for any constant $a$. This clearly gives $w$, but as a row vector. In a metric space, there's no real difference; you can convert between vectors and covectors freely, but the derivative of a scalar should always give a row vector.