This question occurred to me when thinking about differential geometry change of coordinates. Consider the system of equations:
\begin{align*} p(x, y) = x + y &\qquad x(p, q) = p - q \\ q(x, y) = y &\qquad y(p, q) = q \end{align*}
Now, if I wish to evaluate $\frac{dp}{dq}$, there are two evaluation strategies available for me:
\begin{align*} \frac{dp}{dq} = \frac{d(x+y)}{dy} = \frac{dx}{dy} + \frac{dy}{dy} = 0 + 1 = 1 \qquad (1) \end{align*}
One the other hand, consider this evaluation:
\begin{align*} \frac{dp}{dq} = \frac{dp}{dx}\frac{dx}{dq} + \frac{dp}{dy}\frac{dy}{dq} = 1 \cdot -1 + 1 \cdot 1 = 0 \qquad (2) \end{align*}
Indeed, one can prove something much stronger: Let $p_i = f_i(x_1, x_2, \dots x_n)$. Now, the evaluation \begin{align*} \frac{dp_i}{dp_j} = \sum_{k=0}^n \frac{dp_i}{dx_k} \frac{dx_k}{dp_j} = \left[\frac{dp_i}{dx_0} \dots \frac{dp_i}{dx_k} \dots \frac{dp_i}{dx_n} \right] \cdot \left[ \frac{dx_0}{dp_j} \dots \frac{dx_k}{dp_j} \dots \frac{dx_n}{dp_j} \right]^T = (J \cdot J^{-1})_{ij} = \delta_{ij} \end{align*}
where $J$ is the jacobian of the function $\vec p = f(\vec x)$, and $\delta_{ij}$ is the kronecker delta.
I'm puzzled as to which interpretation I should choose. Interpretation (2) is nice if I want to think of the sets of "co-ordinate systems" $\{ p_i \}$ as being linearly independent, just like the original $\{ x_i \}$ are, but I have no idea if this is legal.
I'd love an answer that explains to me when (1) is legal, when (2) is legal, and maybe go into more detail about the relationship with the Jacboian, and related geometric insights!
$(1)$ and $(2)$ do not make sense as written but I think we can make sense of the third calculation in the following, more general way: for convenience, work in $\mathbb R^2;\ $ the extension to $\mathbb R^n$ is obvious.
Fix $(x,y)\in \mathbb R^2,$ le $r^1,r^2$ be the standard coordinates on $\mathbb R^2$, so that $r^1:\mathbb R^2\to \mathbb R$ is projection on the first coordinate and $r^2:\mathbb R^2\to \mathbb R$ is projection on the second coordinate. (This is why, by the way, $\frac{\partial r^i}{\partial r^j}=\delta^i_j.)$
Let the diffeomorphism $\phi:\mathbb R^2\to \mathbb R^2$ be defined by $(p^1(x,y),p^2(x,y))$ Suppose $f:\mathbb R^2\to \mathbb R$ is smooth. We want to make sense of $\frac{\partial f}{\partial p^i}\Big|_p$ for $i=1,2.$ By itself, it has no intrinsic meaning. But $\frac{\partial (f\circ \phi^{-1})}{\partial r^i}\Big|_{\phi(p)}\ \textit{does}$ make sense: it is just the usual partial derivative of $f\circ \phi^{-1}$ at $\phi(p).$ So we $\textit{define}\ \frac{\partial f}{\partial p^i}\Big|_p$ to be $\frac{\partial (f\circ \phi^{-1})}{\partial r^i}\Big|_{\phi(p)}.$ This may seem silly, but it's only because we are working in a Euclidean space $\mathbb R^2$. If we were considering a more abstract space, this definition would in some sense be forced.
Notice now that $p^i=r^i\circ \phi,$ so taking $f=p^i:\mathbb R^2\to \mathbb R$ we have
$\frac{\partial p^i}{\partial p^j}\Big|_p=\frac{\partial (p^i\circ \phi^{-1})}{\partial r^j}\Big|_{\phi(p)}=\frac{\partial (r^i\circ \phi \circ \phi^{-1})}{\partial r^j}\Big|_{\phi(p)}=\frac{\partial r^i}{\partial r^j}\Big|_{\phi(p)}=\delta^i_j.$
Now, your $\phi(x,y)=(x+y,y)$ is a diffeomorphism, so the above applies to it.
Final remark: it is precisely because of our definition of $\frac{\partial f}{\partial p^i}$ that we can manipulate these symbols "in the usual way.". Of course, $\phi^{-1}$ is also a diffeomorphism because $J(\phi)\neq 0$, so your chain rule calculation amounts to showing that the the Jacobian of the identity map is the identity matrix!