Chapter 1 of Griffiths' Electrodynamics is called "Vector Analysis". There is a problem in that chapter that I would like to understand in detail. A question about this particular problem has been asked before, but it doesn't go into the types of details I will go through and ask about in the current question.
Problem 1.14 Suppose that $f$ is a function of two variables $y$ and $z$ only. Show that the gradient $\nabla f=(\partial f/\partial y)\hat{y}+(\partial f/\partial z)\hat{z}$ transforms as a vector under rotations, Eq. 1.29.
Hint: $(\partial f/\partial\bar{y}) = (\partial f/\partial y)(\partial y/\partial \bar{y})+(\partial f/\partial z)(\partial z/\partial \bar{y})$, and the analogous formula for $\partial f/\partial \bar{z}$. We know that $\bar{y}=y\cos{\phi}+z\sin{\phi}$ and $\bar{z}=-y\sin{\phi}+z\cos{\phi}$; "solve" these equations for $y$ and $z$ (as functions of $\bar{y}$ and $\bar{z}$), and compute the needed derivatives $\partial y/\partial\bar{y}$, $\partial z/\partial\bar{y}$, etc.
The cited equation 1.29 is a matrix equation for transforming coordinates in one set of axes to coordinates in another set of axes that is rotated by $\phi$ radians relative to the first coordinates.
$$\begin{pmatrix} \overline{y}\\ \overline{z} \end{pmatrix}=\begin{pmatrix} \cos{\phi} & \sin{\phi} \\ -\sin{\phi} & \cos{\phi} \end{pmatrix}\begin{pmatrix} y \\ z \end{pmatrix}\tag{1}$$
From the hint, it seems we are asked to figure out what $\partial f/\partial\bar{y}$ and $\partial f/\partial\bar{z}$ are, and to verify that they satisfy the relationship
$$\begin{pmatrix} \partial f/\partial\bar{y}\\ \partial f/\partial\bar{z} \end{pmatrix}=\begin{pmatrix} \cos{\phi} & \sin{\phi} \\ -\sin{\phi} & \cos{\phi} \end{pmatrix}\begin{pmatrix} \partial f/\partial y\\ \partial f/\partial z \end{pmatrix}\tag{2}$$
That is,
$$\overline{\nabla f}=\begin{pmatrix} \cos{\phi} & \sin{\phi} \\ -\sin{\phi} & \cos{\phi} \end{pmatrix}\nabla f\tag{3}$$
If these two vectors do satisfy this relationship, then it means that they behave as expected under the rotation transformation.
But what is happening at the linear algebra level here?
My first question is: what do $\frac{\partial f}{\partial\overline{y}}$ and $\frac{\partial f}{\partial\overline{z}}$ mean exactly?
The square matrix, let's call it $R$, in (1) is the transformation. The columns are the coordinates of the transformed standard basis vectors $\hat{i}$ and $\hat{j}$, namely $R\hat{i}$ and $R\hat{j}$, and these form a basis for the range of the transformation.
If we stick the gradient vector on the right hand side of (1) we are transforming the vector and obtaining coordinates in the new basis. This happens by taking the same linear combination used with the old basis, but now with the new basis.
As far as I can tell $\frac{\partial f}{\partial\overline{y}}$ and $\frac{\partial f}{\partial\overline{z}}$ are technically the partial derivatives of $f$ under the new basis (that is, if we were to figure out what $f(\overline{y},\overline{z})$ is).
Thus, what (3) says is that this new gradient turns out to have the same coordinates under the new basis as under the old basis, which is how all vectors behave under a linear transformation (is this correct?). That is, by showing (3) we are showing that gradients are just regular ol' vectors like any others in its vector space.
My second question is: why, ex ante, would there be a possibility that these two vectors would not satisfy this relationship?
Here are the calculations I did to accomplish this task
Let
$$g(\overline{y}, \overline{z})=f(y(\overline{y}, \overline{z}),z(\overline{y}, \overline{z}))$$
That is, $f$ as a function of coordinates in the new basis.
$$\frac{\partial g}{\partial \overline{y}}=\frac{\partial f}{\partial \overline{y}}\tag{4}$$
$$\frac{\partial g}{\partial \overline{z}}=\frac{\partial f}{\partial \overline{z}}\tag{5}$$
Therefore, if we can find these partial derivatives of $g$ we will have found $\overline{\nabla f}$, and we hope that this is an expression in terms of $\nabla f$.
Using the chain rule, we have
$$\frac{\partial g}{\partial \overline{y}}=\nabla f(y(\overline{y},\overline{z}),z(\overline{y},\overline{z}))\cdot \left ( \frac{\partial y}{\partial\overline{y}}\hat{i} +\frac{\partial z}{\partial\overline{y}}\hat{j} \right )\tag{6}$$
$$=\frac{\partial f}{\partial y}\frac{\partial y}{\partial \overline{y}}+\frac{\partial f}{\partial z}\frac{\partial z}{\partial \overline{y}}\tag{7}$$
Similarly,
$$\frac{\partial g}{\partial \overline{z}}=\nabla f(y(\overline{y},\overline{z}),z(\overline{y},\overline{z}))\cdot \left ( \frac{\partial y}{\partial\overline{z}}\hat{i} +\frac{\partial z}{\partial\overline{z}}\hat{j} \right )\tag{8}$$
$$=\frac{\partial f}{\partial y}\frac{\partial y}{\partial \overline{z}}+\frac{\partial f}{\partial z}\frac{\partial z}{\partial \overline{z}}\tag{9}$$
Note that we know $\overline{y}(y,z)$ and $\overline{z}(y,z)$ because this is given by (1), but we don't know $y(\overline{y},\overline{z})$ or $z(\overline{y},\overline{z})$.
This is why we need to solve for $y$ and $z$ in (1), and when we do this we obtain
$$y=-\overline{z}\sin{\phi}+\overline{y}\cos{\phi}$$ $$z=\overline{y}\sin{\phi}+\overline{z}\cos{\phi}$$
From which we can compute
$$\frac{\partial y}{\partial\overline{y}}=\cos{\phi}$$ $$\frac{\partial y}{\partial\overline{z}}=-\sin{\phi}$$ $$\frac{\partial z}{\partial\overline{y}}=\sin{\phi}$$ $$\frac{\partial z}{\partial\overline{z}}=\cos{\phi}$$
and plugging these into (7) and (9) we obtain
$$\frac{\partial f}{\partial\overline{y}}=\frac{\partial f}{\partial y}\cos{\phi}+\frac{\partial f}{\partial z}\sin{\phi}\tag{7}$$
$$\frac{\partial f}{\partial\overline{z}}=\frac{\partial f}{\partial y}(-\sin{\phi})+\frac{\partial f}{\partial z}\cos{\phi}\tag{7}$$
Thus, we have shown by direct computation of the partial derivatives that (2) and (3) are true.
If you want to get rid of the partial derivatives, you can also use the differential of $f$ (assuming $f$ is $\mathcal{C}^1$). $df(x)$ is defined, when it exists, as the only linear map such that $f(x + h) - f(x) = df(x)h + \mathrm{o}(|h|)$ when $h \rightarrow 0$ ($|h|$ is the norm of $h$). And when you have a scalar product, the gradient of $f$ at $x$ is the only vector such that for all $h$, $\left<\nabla f(x),h\right> = df(x)h$, given by Riesz' representation theorem. In a system of coordinates, the gradient is given by the formula you wrote and the differential is simply, $$ df(x)h = \sum_{i = 1}^n \frac{\partial f}{\partial x_i}(x)h_i. $$ For an unknown reason, mathematicians prefer differentials and physicists partial derivatives. The chain rule with differentials is $d(f \circ g)(x) = df(g(x)) \circ dg(x)$ when $dg(x)$ and $df(g(x))$ exist. And if $P$ is a linear map, it is easy to see that for all $x$, $dP(x) = P$.
Now, any change of coordinate is given by $\overline{x} = Px$ when $P$ is the matrix of change of basis (hence it is invertible). In the case where $P$ preserves the norm and the orientation, it is a special orthogonal matrix i.e. $P^{-1} = P^\top$ and $\det(P) = 1$. In the two dimensional case, it is a rotation matrix given by the formula you wrote.
Let $\tilde{f}(\overline{x}) = f(x)$ i.e. $\tilde{f} \circ P = f$ thus $\tilde{f} = f \circ P^{-1}$ be the function $f$ expressed in the new coordinates. The chain rule gives you, $$ d\tilde{f}(\overline{x}) = df(P^{-1}\overline{x}) \circ d(P^{-1})(\overline{x}) = df(x) \circ P^{-1}. $$ Therefore, for all $h$, $$ \left<\nabla\tilde{f}(\overline{x}),h\right> = d\tilde{f}(\overline{x})h = df(x)P^{-1}h = \left<\nabla f(x),P^{-1}h\right> = \left<P^{-1\top}\nabla f(x),h\right>. $$ We deduce that $\nabla\tilde{f}(\overline{x}) = P^{-1\top}\nabla f(x)$. And in the case where $P$ is a rotation, $P^{-1\top} = P$ so $\nabla\tilde{f}(\overline{x}) = P\nabla f(x)$. This is the wanted formula. I hope it is clearer. Notice that all this generalizable to any dimension (even the infinite dimension in the case of Hilbert spaces).