I'm trying to derive the gradient in polar coordinates using the chain rule.
So the idea is that when we have a function $f(x,y)$ and we switch to polar coordinates, we're really composing $f$ with $P(r,\theta) = (r\cos(\theta),r\sin(\theta))$. So then the gradient of $f$ in polar coordinates should just be $\nabla (f\circ P)(r,\theta)$.
From my last question I know that $$\nabla(f\circ P)(r,\theta) = \begin{bmatrix}(\partial_1f\circ P)(r,\theta) & (\partial_2f\circ P)(r,\theta)\end{bmatrix}\begin{bmatrix}\partial_1P_1(r,\theta) & \partial_2P_1(r,\theta) \\ \partial_1P_2(r,\theta) & \partial_2P_2(r,\theta)\end{bmatrix} \\ = \begin{bmatrix} (\partial_1f\circ P)(r,\theta)\cdot\partial_1P_1(r,\theta) + (\partial_2f\circ P)(r,\theta)\cdot\partial_1P_2(r,\theta) \\ (\partial_1f\circ P)(r,\theta)\cdot\partial_2P_1(r,\theta) +(\partial_2f\circ P)(r,\theta)\cdot\partial_2P_2(r,\theta)\end{bmatrix}^T$$
But I also know that $$\partial_1(f\circ P)(r,\theta) = (\partial_1f\circ P)(r,\theta)\cdot\partial_1P_1(r,\theta) + (\partial_2f\circ P)(r,\theta)\cdot\partial_1P_2(r,\theta) \\ \partial_2(f\circ P)(r,\theta) = (\partial_1f\circ P)(r,\theta)\cdot\partial_2P_1(r,\theta) +(\partial_2f\circ P)(r,\theta)\cdot\partial_2P_2(r,\theta)$$
by the regular chain rule.
So, putting that together I get $$\nabla(f\circ P)(r,\theta) = \begin{bmatrix}\partial_1(f\circ P)(r,\theta) & \partial_2(f\circ P)(r,\theta)\end{bmatrix}$$
i.e. $$\nabla(f\circ P) = \frac{\partial (f\circ P)}{\partial r}\mathbf {\hat r} + \frac{\partial (f\circ P)}{\partial \theta}\mathbf {\hat \theta}$$
If I were to then write this out in more traditional notation, where $f$ and $f\circ P$ are not distinguished, it should look like $$\nabla f = \frac{\partial f}{\partial r}\mathbf {\hat r} + \frac{\partial f}{\partial \theta}\mathbf {\hat \theta}$$
But comparing this to the correct formula for the gradient in polar coordinates, which for reference is usually written as $$\nabla f = \frac{\partial f}{\partial r}\mathbf {\hat r} + \frac 1r\frac{\partial f}{\partial \theta}\mathbf {\hat \theta},$$ I see that I'm missing a factor of $\frac 1r$ on the second term. Where does that come from?
Edit: BTW, I notice something interesting, though it may have nothing to do with the problem I'm having. But $$\begin{bmatrix}\partial_1P_1(r,\theta) & \partial_2P_1(r,\theta) \\ \partial_1P_2(r,\theta) & \partial_2P_2(r,\theta)\end{bmatrix} = \begin{bmatrix}\cos(\theta) & -r\sin(\theta) \\ \sin(\theta) & r\cos(\theta)\end{bmatrix}$$ would be an orthogonal matrix (in fact, it'd be a rotation) if we multiplied the second column by $\frac 1r$. But then that's the column that goes into $\partial_2(f\circ P)(r,\theta)$. So if I normalized that column, then somehow my formula would have come out correctly with the $\frac 1r$. But I see no reason why I should do that. Is this just a weird coincidence?
Polar coordinates: $$ \begin{cases} x = r\cos(\theta) \\ y = r \sin(\theta) \end{cases} \quad \Longrightarrow \quad \begin{cases} \mathbf {\hat r} = \begin{bmatrix} \cos(\theta) \\ \sin(\theta) \end{bmatrix} \\ \mathbf {\hat \theta} = \begin{bmatrix} -\sin(\theta) \\ \cos(\theta) \end{bmatrix} \end{cases} $$ Chain rules: $$ \frac{\partial f}{\partial r} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial r} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial r} = \cos(\theta) \frac{\partial f}{\partial x} + \sin(\theta) \frac{\partial f}{\partial y} \\ \frac{\partial f}{\partial \theta} = \frac{\partial f}{\partial x}\frac{\partial x}{\partial \theta} + \frac{\partial f}{\partial y}\frac{\partial y}{\partial \theta} = - r \sin(\theta) \frac{\partial f}{\partial x} + r \cos(\theta) \frac{\partial f}{\partial y} $$ Matrix format: $$ \begin{bmatrix} \Large \frac{\partial f}{\partial r} \\ \Large \frac{1}{r} \frac{\partial f}{\partial \theta} \end{bmatrix} = \begin{bmatrix} \cos(\theta) & \sin(\theta) \\ - \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} \Large \frac{\partial f}{\partial x} \\ \Large \frac{\partial f}{\partial y} \end{bmatrix} $$ Inverse transform: $$ \begin{bmatrix} \Large \frac{\partial f}{\partial x} \\ \Large \frac{\partial f}{\partial y} \end{bmatrix} = \begin{bmatrix} \cos(\theta) & - \sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} \Large \frac{\partial f}{\partial r} \\ \Large \frac{1}{r} \frac{\partial f}{\partial \theta} \end{bmatrix} = \frac{\partial f}{\partial r} \begin{bmatrix} \cos(\theta) \\ \sin(\theta) \end{bmatrix} + \frac{1}{r} \frac{\partial f}{\partial \theta} \begin{bmatrix} - \sin(\theta) \\ \cos(\theta) \end{bmatrix} $$ Conclusion: $$ \nabla f = \frac{\partial f}{\partial r}\mathbf {\hat r} + \frac 1r\frac{\partial f}{\partial \theta}\mathbf {\hat \theta} $$