I'm a mathematician (with little knowledge of differential geometry) trying to study physics. One of the greatest problems is the language regarding coordinate transformations. I tend to think of such transformations as functions (diffeomorphisms), whereas physicists just rename the arguments, e.g. $f(x,y,z)=f(\rho,\varphi,z)$.
I've gotten used to that and for some things (e.g. integration) it works just fine. But I can't get my head around the transformation (?) of differential operators. For example:
Let $f:\mathbb{R}^3\backslash\big(\{0\}\times\{0\}\times\mathbb{R}\big)\rightarrow\mathbb{R}$ be smooth. I have before me the statement that "in cyclindrical coordinates" $$\nabla f=\vec{\mathbf{e}}_\rho\frac{\partial f}{\partial\rho}+\vec{\mathbf{e}}_\varphi\frac{1}{\rho}\frac{\partial f}{\partial\varphi}+\vec{\mathbf{e}}_z\frac{\partial f}{\partial z}.$$
Now you probably consider me a pedant, but I can only try to understand that by introducing the mapping $\theta:]0,\infty[\times\mathbb{R}\times\mathbb{R}\rightarrow\mathbb{R}^3$, $$\theta(\rho,\varphi,z)=(\rho\cos\varphi,\rho\sin\varphi,z).$$
My understanding is that the above equation is actually $$\nabla(f\circ\theta)=\vec{\mathbf{e}}_\rho\partial_1(f\circ\theta)+\vec{\mathbf{e}}_\varphi\frac{1}{\rho}\partial_2(f\circ\theta)+\vec{\mathbf{e}}_z\partial_3(f\circ\theta).$$ Or is it $\left(\nabla f\right)\circ\theta$ instead of $\nabla(f\circ\theta)$?
And what are these vectors $\vec{\mathbf{e}}_\rho,\vec{\mathbf{e}}_\varphi,\vec{\mathbf{e}}_z$? I read that $\vec{\mathbf{e}}_\varphi$ is the "unit vector in $\varphi$-direction". But what does that even mean? How can one express these "unit vectors" using $\theta$?
There is a lot in the literature about how to derive such equations, but I can't really use it because I don't understand the meaning behind the symbols and I don't really understand the point of it all.
$\newcommand{\Reals}{\mathbf{R}}\newcommand{\Basis}{\mathbf{e}}\newcommand{\dd}{\partial}$Your understanding is perfectly correct, and although it may be overly-careful for everyday use, it's a good idea to be certain what symbols mean! :)
(If you can find a library copy, Jan J. Koenderink's Solid Shape is a highly worthwhile if somewhat idiosyncratic read, a profoundly geometric introduction to differential geometry geared toward engineers and physicists, in the aesthetic spirit of Hilbert and Cohn-Vossen.)
If $x = (x_{1}, \dots, x_{n})$ is a coordinate system (which formally should be viewed as Cartesian, since "differential calculus looks the same in arbitrary coordinates"), the standard basis vector fields are viewed as partial differentiation operators, $$ \Basis_{j} \leftrightarrow \frac{\dd}{\dd x_{j}}, $$ via their action as directional derivatives on functions: $$ \Basis_{j}(x) f = \lim_{t \to 0} \frac{f(x + t\Basis_{j}) - f(x)}{t} = \frac{\dd f}{\dd x_{j}}(x) = \frac{\dd}{\dd x_{j}}(x) f. $$
Briefly, the issues in your question boil down to the chain rule, which enters as soon as you "compare" derivatives with respect to two different coordinate systems.
If $\theta:\Reals^{n} \to \Reals^{n}$ is a continuously-differentiable change of coordinates, and if we write $y = \theta(x)$, then in "classical" notation, $$ \frac{\dd}{\dd x_{j}} = \frac{\dd y_{1}}{\dd x_{j}}\, \frac{\dd}{\dd y_{1}} + \dots + \frac{\dd y_{n}}{\dd x_{j}}\, \frac{\dd}{\dd y_{n}}. \tag{1} $$
Particularly, if $$ (x, y, z) = \theta(\rho, \varphi, z) = (\rho\cos\varphi, \rho\sin\varphi, z), $$ then (pardon the use of $z$ in two conceptually-distinct but logically-identical (!) roles) \begin{alignat*}{3} \Basis_{\rho} &:= \frac{\dd}{\dd\rho} &&= \frac{\dd x}{\dd\rho}\, \frac{\dd}{\dd x} + \frac{\dd y}{\dd\rho}\, \frac{\dd}{\dd y} + \frac{\dd z}{\dd\rho}\, \frac{\dd}{\dd z} &&= \cos\varphi\, \Basis_{1} + \sin\varphi\, \Basis_{2}; \\ \Basis_{\varphi} &:= \frac{\dd}{\dd\varphi} &&= \frac{\dd x}{\dd\varphi}\, \frac{\dd}{\dd x} + \frac{\dd y}{\dd\varphi}\, \frac{\dd}{\dd y} + \frac{\dd z}{\dd\varphi}\, \frac{\dd}{\dd z} &&= -\rho\sin\varphi\, \Basis_{1} + \rho\cos\varphi\, \Basis_{2}; \\ \Basis_{z} &:= \frac{\dd}{\dd z} &&= \frac{\dd x}{\dd z}\, \frac{\dd}{\dd x} + \frac{\dd y}{\dd z}\, \frac{\dd}{\dd y} + \frac{\dd z}{\dd z}\, \frac{\dd}{\dd z} &&= \Basis_{3}. \end{alignat*} To express the Cartesian frame $(\Basis_{1}, \Basis_{2}, \Basis_{3})$ in terms of the cylindrical frame $(\Basis_{\rho}, \Basis_{\varphi}, \Basis_{z})$ , one either inverts the preceding system with linear algebra, or else (locally) inverts the change of coordinates $\theta$ itself, and computes the corresponding partial derivatives for (1). (The results agree by the chain rule.)
The "modern" viewpoint is that each coordinate domain ($U$ and $V$, say) is a smooth ($3$-)manifold, and each coordinate system defines a trivialization of the respective tangent bundle via its coordinate vectors. The coordinate change $\theta:U \to V$ induces an isomorphism $\theta_{*}:TU \to TV$, defined by $$ \theta_{*}(x, v) = \bigl(\theta(x), D\theta(x)(v)\bigr). $$ Consequently, there are two frames for $TV$: The "native" coordinate frame in $V$ (the $\dd/\dd y_{i}$ in (1)), and the "transplanted" image of the coordinate frame from $U$ (the $\dd/\dd x_{j}$ in (1), which properly speaking are $\theta_{*}\dd/\dd x_{j}$). The chain rule expresses the latter as linear combinations of the former.
If your primary interest is computation, it's best to get comfortable with the classical (abuses of) notation. :)