Define $C_V=T (\partial{S}/\partial{T})_V$ and $C_P=T (\partial{S}/\partial{T})_P$.
Prove that $C_P-C_V=T(\partial{V}/\partial{T})_P (\partial{P}/\partial{T})_V.$
This is equivalent to proving $(\partial{S}/\partial{T})_P-(\partial{S}/\partial{T})_V=-(\partial{S}/\partial{p})_T (\partial{P}/\partial{T})_V$ by Maxwell's relation. And this equation is intuitively true, since imagine changing $T \mapsto T+\delta T$, while holding $V$ constant. Then $P \mapsto P+(\partial{P}/\partial{T})_V \delta T$. Then we imagine decreasing $P$ back to its original value, while holding $T$ constant. So we change $S$ by $-(\partial{S}/\partial{p})_T (\partial{P}/\partial{T})_V \delta T$, which is exactly what the formula says.
Can the above idea be more formally proven, and also generalized to any variables $X,Y,Z$?
The proof I was taught goes as follows.
We know that $S = -(\partial{G}/\partial{T})_P = -(\partial{H}/\partial{T})_P-P(\partial{V}/\partial{T})_P$
$(\partial{S}/\partial{T})_V = -(\partial^2{H}/\partial{T^2})_{P,V}-(\partial{P}/\partial{T})_V(\partial{V}/\partial{T})_P-P(\partial^2{V}/\partial{T^2})_{P,V}.$
We argue the last term is zero, since we may swap differentiation with $V,P$ held constant. Then we also claim that we can swap order of differentiation in the first term on the RHS, giving $(\partial{S}/\partial{T})_P$.
Why can we swap derivatives in this way? I know that if we have $F(x,y)$, then $F_{x,y}=F_{y,x}$, but this seems like a different situation.
Edit: Consider $z(x,y)=x^2y$. Then $(\partial{z}/\partial{x})_y=2xy=g(x,y)$. Then $(\partial g/\partial x)_z = 2y+2x(\partial y/\partial x)_z=2y+2x(-2y/x)\neq 0 = (\partial [(\partial z/\partial x)_z] / \partial x)_y$.
I don't know if it answers your second question, but in general: $$\forall f(x, y) \in C^2(I, \mathbb{R}) | x, y \in I, \ \frac{\partial f(x, y)}{\partial x \partial y} = \frac{\partial f(x, y)}{\partial y \partial x} $$ and: $$\forall g(x, \not y), \frac{\partial}{\partial y} g(x) f(x, y) = g(x) \frac{\partial}{\partial y}f(x, y)$$