If $z=f(x,y)$ where $x=g(r,\theta), y=h(r, \theta)$, then can you give me a good reason why
$$\frac{\partial^2 z}{\partial r^2} \neq \frac{\partial}{\partial x}\frac{\partial z}{\partial r}\frac{\partial x}{\partial r} + \frac{\partial}{\partial y}\frac{\partial z}{\partial r}\frac{\partial y}{\partial r} \qquad \qquad ?$$
I thought that I was just using the usual chain rule that one uses for a function ($\frac{\partial z}{\partial r}$ in this case) of two other functions ($x$ and $y$ in this case). How do I know that I am supposed to evaluate $\frac{\partial}{\partial r}\frac{\partial z}{\partial r}$ first before using the chain rule?
Whenever I doubt something like this, what I try to do is write it out in absolute detail. $\newcommand{\par}[2]{\displaystyle\frac{\partial #1}{\partial #2}}$ $\newcommand{\spar}[2]{\displaystyle\frac{\partial^2 #1}{\partial #2^2}}$ $\newcommand{\cpar}[3]{\displaystyle\frac{\partial^2 #1}{\partial #2\partial #3}}$
We have three functions $$\begin{align} &{\,\,\,f:\Bbb R^2\to \Bbb R }\atop {(x,y) \mapsto z} \\[2ex] &{g:\Bbb R^2\to\Bbb R}\atop {(r,\theta)\mapsto g(r,\theta)} \\[2ex] &{h:\Bbb R^2\to\Bbb R}\atop {(r,\theta)\mapsto h(r,\theta) }\end{align}$$
It's now comfortable to introduce a new function $\varphi:\Bbb R^2\to\Bbb R^2$ defined by $$\varphi(r,\theta)= \begin{pmatrix}g(r,\theta) & h(r,\theta)\end{pmatrix}$$ and we name $z=f\circ\varphi$.
The chain rule is thus simply $$D_pz = D_{\varphi(p)}f\circ D_p \varphi$$ That is, the derivative of a composition (at $p\in\Bbb R^2$) is the composition of the derivatives (at appropriate points). What this means for the partial derivaties, is linear algebra: $$\begin{pmatrix}\par zr & \par z\theta \end{pmatrix}_p=\begin{pmatrix}\par fx & \par fy\end{pmatrix}_{\varphi(p)}\cdot\begin{pmatrix}\par gr &\par g\theta \\[1ex] \par hr &\par h\theta \end{pmatrix}_p$$ Therefore $$\par zr(p) = \par fx(g(r,\theta),h(r,\theta))\cdot \par gr(r,\theta) + \par fy(g(r,\theta),h(r,\theta))\cdot \par hr(r,\theta)$$
But this is terribly hard on the vision. Therefore we allow $g,h$ to be replaced by $x,y$, respectively (this is a seemingly convenient trick, but it can be justified if we think of the coordinates of the plane as functions), and we stop specifying the points at which we evaluate (this is nothing more than lazy writing). We end up with $$\begin{align} &\par fr = \par fx \par xr + \par fy \par yr & (1)\end{align}$$
and now we realize that we did need to write where we were evaluating, if we are to differentiate again, because we have to apply the chain rule (and the product/Leibniz rule) to the expression. Consider the first term: $$\left(\par fx\circ \varphi\right)\cdot\par xr$$
By the way, here it's always good to verify that everything in the above expression makes "dimensional sense", (i.e. track the result of applying to $(r,\theta)$). It's a good way to detect errors! But anyway we will again take the total derivative. Since the above is a product of scalar functions, apply the Leibniz rule: $$D_p(\cdots) = D_p\left(\par fx\circ \varphi\right)\par xr+\left(\par fx\circ \varphi\right) D_p\par xr$$
Using the chain rule on the first summand we have, in matrix form: $$\begin{pmatrix}\spar fx &\cpar fxy\end{pmatrix}_{\varphi(p)}\cdot\begin{pmatrix}\par xr &\par x\theta \\[1ex] \par yr &\par y\theta\end{pmatrix}_p\par xr + \left(\par fx\circ \varphi\right)\begin{pmatrix}\spar xr & \cpar xr\theta \end{pmatrix}_p$$
The first coordinate of the above is $$\spar fx(\varphi(p)) \left(\par xr\right)^2(p) + \left(\cpar fxy\right)(\varphi(p)) \left(\par yr \par xr\right)(p) + \par fx(\varphi(p))\spar xr(p) $$
Again simplifying the notation, this is $$\spar fx \left(\par xr\right)^2 + \cpar fxy\par yr \par xr + \par fx\spar xr$$
If we go back and do the same for the second term of $(1)$, the result is (for the first coordinate) $$\cpar fyx \par xr\par yr+ \spar fy\left(\par yr\right)^2 + \par fy\spar yr$$
Therefore the correct expression is $$\begin{align}\spar zr = &\spar fx\left(\par xr\right)^2 + \spar fy\left(\par yr\right)^2 + \left(\cpar fxy +\cpar fyx\right)\par xr \par yr + \\[1ex] &+ \par fx\spar xr +\par fy\spar yr\end{align}$$
After a lot of these, you'll do the matrix part in your head, but at least for me it was useful to write out the chain rule in full the first couple of times. Also, the chain rule is always valid, you just need to make sure you're using it correctly.