Let $(x,y)$ a coordinate system and $f:\mathbb R^2\to \mathbb R$ a derivable function. Then $$\nabla f(x,y)=\left(\partial _xf(x,y),\partial _yf(x,y)\right).$$
If I'm in $(u,v)$, then $$\nabla f(u,v)=\left(\partial _u(u,v),\partial _v(u,v)\right).$$
So, when I'm in $(r,\theta )$ why the gradient is different ? i.e. why is it $$\nabla f(r,\theta )=\left(r\cos\theta \partial_rf-\sin\theta \partial _\theta f, r\sin\theta \partial _r+\cos\theta \partial _\theta f\right)$$
and not $(\partial _rf,\partial _\theta f)$ as previously ? This look misterious for me.
There is a common confusion with how functions are described.
Function on 2 different variables can be thought in 2 different ways. One, as a function taking on 2 input. Two, as a function taking on a point in a plane. Usually, these 2 concepts are conflated. But for the purpose of geometry and calculus, function should be thought as taking on a point on the plane. Therefore, when you see a definition of a function, you should understand it as implicitly defining a function on a plane instead. This implicit assumption depends on context, for example $f(x,y)$ should be implicitly understood as a the value of $f$ at the point with Cartesian coordinate $(x,y)$ but $f(r,\theta)$ would be implicitly understood to be the value of $f$ on the point with polar coordinate $(r,\theta)$.
Which is why $f(r,\theta)=f(r\cos\theta,r\sin\theta)=f(x,y)$, something that look nonsensical at first, but it's not. Here $f(r,\theta)$ is the value of $f$ at the point described by polar coordinate $(r,\theta)$, $f(r\cos\theta,r\sin\theta)$ is the value of $f$ at Cartesian coordinate $(r\cos\theta,r\sin\theta)$, and $f(x,y)$ is the value of $f$ at Cartesian coordinate $(x,y)$ where numerically $x=r\cos\theta,y=r\sin\theta$.
So remember this, when it comes down to it, these functions are defined on points on the plane, and $f(x,y)$ and $f(r,\theta)$ are just different way we describe the function using plane coordinate, because it's impossible to specify a point without a coordinate.
Next, the issue is whether you are using Cartesian coordinate or polar coordinate to describe the gradient. It looks like you used Cartesian, which result in a mess.
Now, a gradient is a vector field $u$, such that for point $p$ and any vector $v$ then $D_{v}f(p)=u(p).v$ where the . is the geometric dot product. I specifically say "geometric dot product" to refer to the definition of dot product as "product of length times cosine of angle in between".
You can describe vector $v$ in term of coordinate, and the result of directional derivative should be the same as expected, whether you're in polar coordinate or Cartesian. But what change is actually the dot product.
In Cartesian coordinate, the dot product is very simple: (a,b).(c,d)=ac+bd. The geometric dot product can be computed easily from coordinate just like that. Hence the simple form of the gradient. You don't even need to care where these vector are based at. As a result, gradient is very simple.
But in polar coordinate, the dot product is distorted in term of coordinate. The reason is because when you change your $\theta$ at the rate of $1$ you don't move at the rate of $1$ anymore, but the rate of $r$ where $r$ is the distance to the origin from the point these vectors are based at. Hence there is a factor that is needed to account for this distortion. So the formula for polar coordinate dot product is less nice, it's $(a,\alpha).(b,\beta)=ab+\frac{\alpha}{r}\frac{\beta}{r}$. Notice the need for the extra factor of $\frac{1}{r}$ to account for length distortion. As a result, gradient is messier, you get the gradient to be $\frac{\partial f}{\partial r}R+\frac{1}{r}\frac{\partial f}{\partial\theta}\Theta$ (where $R,\Theta$ are unit vector for polar coordinate). Note that from this formula immediately tell you your coordinate of the gradient in term of polar coordinate, but not Cartesian coordinate. To get back Cartesian coordinate, you need to account for your rotation, which is how you get that mess.