I understand that $\theta $ basis vector have variable length proportional to the radius.
Some change in $\theta$ trace longer curve the further it away from the center. So the same $\Delta\theta$ make bigger impact on the function change the larger the current $r$ is.
What I don't understand intuitively is why this behaviour shouldn't propagate to the gradient as well.
The correct formula for gradient in polar coordinates $$\nabla f = \frac{\partial f}{\partial r}\hat{r} + \frac{1}{r}\frac{\partial f}{\partial \theta}\hat{\theta}$$ explicitly compensates for this behaviour.
But why $$\nabla f = \frac{\partial f}{\partial r}\hat{r} + \frac{\partial f}{\partial \theta}\hat{\theta}$$ wouldn't be the "direction and rate of fastest increase" in polar coordinates?
I've seen the derivations, and I understand why they are correct, but I have the problem that incorrect formula still seems more natural for me intuitively as "direction and rate of fastest increase" in polar coordinates.
Make an explicit example. Consider the function $f(r, \theta)=\theta$. It is clear that the gradient $\nabla f$ must have the direction $e_\theta$ everywhere. But what would its magnitude be, as the correct formula says $1/r$ while yours predicts $1$?
Just integrate it. You must have the fundamental formula $$ f(p_2)-f(p_1)=\int_\gamma \nabla f\cdot \vec{ds}, $$ where $\gamma$ is a line that connects the two points $p_2$ and $p_1$. Suppose these points have angular coordinates $\theta_2$ and $\theta_1$ respectively, so the left--hand side of the latter equation reads $\theta_2-\theta_1$. Suppose these points are joined by an arc of circumference $\gamma$ of radius $r$, so the line element reads $\vec{ds}= e_\theta\, rd\theta$. Then, plugging in your formula for $\nabla f$, the right--hand integral reads $$ \int_{\theta_1}^{\theta_2} r\,dr=r(\theta_2-\theta_1), $$ which is off by a factor of $r$ from the correct one. This $r$ is precisely the factor you missed in your formula.