I'm TA'ing multivariable this semester, and I just noticed that we always tend to normalize all our basis vectors when using polar coordinates. This is in stark contrast what I'm used to in differential geometry, as we'd prefer that our coordinate basis to transform by the law \begin{align*} \frac{\partial}{\partial x}&=\frac{\partial r}{\partial x}\frac{\partial}{\partial r}+\frac{\partial \theta}{\partial x}\frac{\partial}{\partial \theta}\\ &=\cos\theta\frac{\partial}{\partial r}-\frac{\sin\theta}{r}\frac{\partial}{\partial \theta}, \end{align*} and similarly with $\frac{\partial}{\partial y}$. This $\frac{1}{r}$ factor makes up for the fact that if we travel in the angular direction, we cover more ground the further we are from the origin. So for example, the gradient in these "geometric" polar coordinates would take on the form $$\nabla f |_{(r,\theta)}=(\frac{\partial}{\partial r},\frac{1}{r^2}\frac{\partial}{\partial \theta})$$ which agrees with the usual way of defining gradients locally by $\nabla f=g^{ij}(\partial_if)\partial_j$. This in opposition to the more common $\frac{1}{r}$ factor which comes using the normalized polar coordinate system. So why are we normalizing these coordinates? If you insist on working in an orthonormal frame, why not call it a polar frame instead of polar coordinates to avoid bad practices in the future?
Edit: Let me put in an explicit computation in with the "geometric" (which I learned is called holonomic) basis. Consider $$f(x,y)=\frac{x}{x^2+y^2},$$ so that in polar coordinates, $$f(r,\theta)=\frac{\cos\theta}{r}.$$ One sees: \begin{align*} \nabla f&=\frac{\partial f}{\partial x}\bigg\vert_{(r,\theta)}\frac{\partial}{\partial x}+\frac{\partial f}{\partial y}\bigg\vert_{(r,\theta)}\frac{\partial}{\partial y}\\ &=\frac{\sin^2\theta-\cos^2\theta}{r^2}\left(\cos\theta\frac{\partial}{\partial r}-\frac{\sin\theta}{r}\frac{\partial}{\partial \theta}\right)-\frac{2\cos\theta\sin\theta}{r^2}\left(\sin\theta\frac{\partial}{\partial r}+\frac{\cos\theta}{r}\frac{\partial}{\partial \theta}\right)\\ &=\frac{\sin^2\theta\cos\theta-\cos^3\theta-2\cos\theta\sin^2\theta}{r^2}\frac{\partial}{\partial r}+\frac{-\sin^3\theta+\cos^2\theta\sin\theta-2\cos^2\theta\sin\theta}{r^3}\frac{\partial}{\partial \theta}\\ &=-\frac{\cos\theta}{r^2}\frac{\partial}{\partial r}-\frac{\sin\theta}{r^3}\frac{\partial}{\partial \theta}\\ &=\frac{\partial f}{\partial r}\frac{\partial }{\partial r}+\frac{1}{r^2}\frac{\partial f}{\partial \theta}\frac{\partial}{\partial \theta}. \end{align*}

My naïve guess is simply that people like orthonormal bases so that they can apply a Pythagoras-like formula to get lengths $$\|a e_r + b e_\theta \| =\sqrt{a^2+b^2}$$ and so they avoid introducing the components of the metric tensor. Projection formulas are also easier, like the component of $v$ in the angular direction is $(v\cdot e_\theta)e_\theta$. This last point can be remedied by using the dual basis of the dual space, but again most people avoid talking about linear functionals by using their Riesz representatives.
Finally, the vector calculus formalism is not designed to work nicely in arbitrary coordinates (an example of this is how the vector Laplacian has to be defined using the curl of the curl). People using vector calculus tend to not care about coordinate invariance of their expressions, so they are happy (or at least won't complain as much as a differential geometer would) having different expressions in different coordinate systems.