I am reading "A Student's Guide to Vectors and Tensors" by Daniel Fleisch and I am struggling to understand his reasoning on why the components of the gradient vector transform covariantly. On page 129, using the chain rule with index notation, he shows that the gradient transforms from the unbarred system to the barred system according to
\begin{equation*} \frac{\partial f}{\partial\bar{x}^{i}} = \frac{\partial x^{j}}{\partial\bar{x}^{i}}\frac{\partial f}{\partial x^{j}}. \end{equation*}
Then he says
But in this case the elements of the transformation matrix $\frac{\partial x^{j}}{\partial\bar{x}^{i}}$ are the inverse of those in the transformation of the differential length elements (which are $\frac{\partial\bar{x}^{i}}{\partial x^{j}}$). And just as in that case the $\frac{\partial\bar{x}^{i}}{\partial x^{j}}$ terms represent the components of vectors that point along the original coordinate axes, in this case the $\frac{\partial x^{j}}{\partial\bar{x}^{i}}$ terms represent the components of vectors that are perpendicular to the original coordinate surfaces. Hence in this case the weighting factors are the components of the (contravariant) dual basis vectors, which means that the components of the gradient vector transform as covariant components.
How does he know that the $\frac{\partial x^{j}}{\partial\bar{x}^{i}}$ terms represent the components of vectors that are perpendicular to the original coordinate surfaces?
I am aware that one can write $\frac{\partial}{\partial x^{i}}$ as the basis vectors in the unbarred system, meaning that the $\frac{\partial\bar{x}^{i}}{\partial x^{j}}$ terms are the components of the vectors parallel to the coordinate axes, such that:
\begin{align*} \bar{\mathbf{e}}_{1} &= \frac{\partial\bar{x}^{1}}{\partial x^{1}}\mathbf{e}_{1} + \frac{\partial\bar{x}^{2}}{\partial x^{1}}\mathbf{e}_{2} \\ \bar{\mathbf{e}}_{2} &= \frac{\partial\bar{x}^{1}}{\partial x^{2}}\mathbf{e}_{1} + \frac{\partial\bar{x}^{2}}{\partial x^{2}}\mathbf{e}_{2} \end{align*}
in the case of two dimensions. So, following this reasoning, would I then construct a similar pair of equations with the $\frac{\partial x^{j}}{\partial\bar{x}^{i}}$ terms and the dual basis vectors $\mathbf{e}^{1}$ and $\mathbf{e}^{2}$?
$ \newcommand\PD[2]{\frac{\partial#1}{\partial#2}} $
Vectors are geometric entities independent of any coordinatization. Hence it makes sense to define coordinate functions which map a vector $v$ to it's coordinates, $x^1(v), x^2(v), \dotsc$. We also have the inverse, where given coordinate values we can determine a vector $v(x^1,x^2,\dotsc)$, and this allows us to define the basis vectors associated with the coordinate system as $e_i = \partial v(x^1,x^2,\dotsc)/\partial x^i$. Given an inner product, there is also a unique reciprocal basis $\{e^i\}$ defined such that $e^i\cdot e_j = \delta^i_j$. By this definition it is evident that $$ \sum_j e_j(e^j\cdot v) = v = \sum_j e^j(e_j\cdot v). $$
The gradient with respect to $v$ is then essentially defined by $\nabla_v x^i(v) = e^i$. We can always, in any coordinate system, express the gradient as $$ \nabla_v = e^1\PD{}{x^1} + e^2\PD{}{x^2} + \cdots = e_1\PD{}{x_1} + e_2\PD{}{x_2} + \cdots. $$ It's worth noting that when $\{e_i\}$ is an orthonormal basis then it is its own reciprocal $e^i = e_i$.
Here's an example; in particular we see it isn't necessarily true that $x^i = e^i\cdot v$. (Though this is not the most illustrative example, since the resulting basis is orthogonal.) In 2D polar coordinates, $$ v(r,\theta) = (r\cos\theta)e_1 + (r\sin\theta)e_2, $$$$ r(v) = |v|,\quad \theta(v) = \arctan\frac{y}{x} $$ where $\{e_1,e_2\}$ is the standard basis, we are using the standard inner product, and $x := v\cdot e_1$ and $y := v\cdot e_2$. Then the basis vectors are $$ e_r = \PD{v}{r} = (\cos\theta)e_1 + (\sin\theta)e_2, $$$$ e_\theta = -(r\sin\theta)e_1 + (r\cos\theta)e_2, $$$$ e^r = \nabla_v r(v) = \frac v{|v|} = e_r, $$$$ e^\theta = \nabla_v \theta_v = \frac{x^2}{x^2 + y^2}\left(-\frac{y}{x^2}e_1 + \frac1{x}e_2\right). $$ We can find that $e^\theta$ simplifies to $$ e^\theta = \frac1r[-(\sin\theta)e_1 + (\cos\theta)e_2] = \frac1{r^2}e_\theta, $$ and consequently $$ e^\theta\cdot v = e^\theta\cdot(re_r) = 0. $$ However, these basis vectors depend on the coordinates. If we instead look at $v(r',\theta')$, we get $$ e^\theta\cdot v(r',\theta') = (e_\theta/r)\cdot e_{r'} = (-\sin\theta)(\cos\theta') + (\cos\theta)(\sin\theta') = \sin(\theta'-\theta). $$
Now recall that the gradient of a function is perpendicular to its level curves. Coordinate surfaces are just the level curves of the coordinate functions $x^i(v)$, i.e. the set of all $v$ such that $x^i(v) = c$ for some constant $c$, and as just discussed $\nabla_v x^i(v) = e^i$, so the reciprocal vectors are perpendicular to the corresponding coordinates curves. All that is left, then, is to see what can give us $\{e^i\}$.
Consider a new set of coordinates $\{\bar x^i\}$ with basis $\{\bar e_i\}$. By the chain rule $$ e_i = \PD v{x^i} = \sum_j \PD{\bar x^j}{x^i}\PD v{\bar x^j} = \sum_j \PD{\bar x^j}{x^i}\bar e_j, $$ Which establishes $\partial\bar x^j/\partial x^j$ as the transformation matrix between $e_i$ and $\bar e_j$. We then consider that $$ e^i = \nabla_v x^i(v) = \sum_j \bar e^j(\bar e_j\cdot\nabla_v)x^i(v) = \sum_j \bar e^j\PD{x^i}{\bar x^j}. $$ This is what Fleisch is talking about. $e^i$ is perpendicular to the $x^i$ coordinate surface, and hence the quantities $\partial{x^i}/\partial{\bar x^j}$ (the inverse matrix of $\partial{\bar x^i}/\partial{x^j}$) are the coordinates in the $\bar e^j$ basis of vectors perpendicular to the original coordinate surfaces. If we want something using the $\{\bar e_i\}$ basis, then instead we would get $$ e^i = \nabla_v x^i(v) = \sum_j \bar e_j(\bar e^j\cdot\nabla_v)x^i(v). $$