$$\vec{\nabla} = \left(\frac{\partial}{\partial x_1}, \cdots, \frac{\partial}{\partial x_n}\right)$$
If $\vec{\nabla}$ is a $1\times n$ vector, then how can
$$\vec{\nabla}f = \operatorname{grad}f = \left(\frac{\partial}{\partial x_1}, \cdots, \frac{\partial}{\partial x_n}\right)^T$$
be a $n\times1$ vector?
If I take $$\vec{x} = \begin{pmatrix}x_1 \\x_2\\x_3\end{pmatrix}$$ then $$\quad\vec{x} \cdot a= \begin{pmatrix}a \cdot x_1 \\a \cdot x_2\\a \cdot x_3\end{pmatrix}$$
and $\vec{x} \cdot a = \left(a \cdot x_1, a \cdot x_2, a \cdot x_3\right)$.
Why does the vector change from $1\times n$ to $n\times 1$? Or more specifically why did I get $\vec{\nabla} = \left(\frac{\partial}{\partial x_1}, \cdots, \frac{\partial}{\partial x_n}\right)^T$ marked as wrong when asked the definition of the nabla operator?
Update
${\bf 1.\ }$In the first place $\left({\partial\over\partial x},{\partial\over\partial y},{\partial\over\partial z}\right)$ is not a vector, i.e. an element of some vector space, but a mnemotechnical device.
${\bf 2.\ }$There is no single consistent notation dealing with first derivatives of multivariate functions and all that, which is in force worldwide as of today.
${\bf 3.\ }$In the following I'm describing how I am perceiving things.
Given a differentiable function $f:\>{\mathbb R}^3\to{\mathbb R}$ and a point $p\in{\rm dom}(f)$ one has $$f(p+X)-f(p)=df(p).X+o(|X|)\qquad(X\to0)\ .$$ The differential map $df(p):\>T_p\to{\mathbb R}$ is a linear functional on the tangent space $T_p\simeq{\mathbb R}^n$. In terms of matrix calculus the data specifying $df(p)$ would then be collected in a row vector: $$[df(p)]=[\matrix{f_{.1}(p)&f_{.2}(p)&f_{.3}(p)\cr}]\ .$$ In this way the evaluation $df(p).X$ becomes an ordinary matrix product: $$df(p).X=[\matrix{f_{.1}(p)&f_{.2}(p)&f_{.3}(p)\cr}]\left[\matrix{X_1\cr X_2\cr X_3\cr}\right]\ .$$ Now in ${\mathbb R}^n$ we have an additional structure element, namely the standard scalar product $\cdot\>$. When the points $x\in{\mathbb R}^n$ are considered as $n\times1$ column vectors $[x]$ then the scalar product can be written as a matrix product: $$x\cdot y=[x]^\top [y]\ .$$The scalar product allows to identify the space $T_p^*$ of linear functionals on $T_p$ with the space $T_p$ of "geometrical" tangent vectors itself. This means that there is a certain vector $a\in T_p$ representing $df(p)$. This vector $a$ is called the gradient of $f$ at $p$, and is denoted by $\nabla f(p)$. One then has $$df(p).X=\nabla f(p)\cdot X\ .\tag{1}$$ In terms of coordinates the gradient is of course given by $$\nabla f(p)=\bigl(f_{.1}(p),f_{.2}(p),f_{.3}(p)\bigr)\ ,$$ and being an element of $T_p$ this gradient can be conceived as a column vector, as are the increment vectors $X\in T_p$.
Now, if you insist on computing scalar products in terms of matrix algebra the vector $\nabla f(p)\in T_p$ has to be converted into a row vector, so that $(1)$ assumes the form $$df(p).X=\bigl[\nabla f(p)\bigr]^\top[X]\ .$$