Is th following legitimate? It is basically a switch of integration with respect to $x$ and the derivative with respect to $y$ of the delta function. $$\int \int dx dy \ g(y) f(x)\frac{\partial }{\partial y}\delta (y-x) = \int dy \ g(y)\frac{\partial}{\partial y} \int dx \ f(x) \delta (y-x) = \int dy \ g(y)\frac{\partial }{\partial _y} f(y) $$
Or are there caveats when it comes to switching the order of integration and taking the derivative, when it comes to a delta function of two different variables?
So first, one does not usually use the integral sign when integrating the derivative of the delta function as it is not a measure.
As usual, to understand what is happening, one needs to go back to the definitions. One can for example define $T := "\delta(x-y)"$ as a distribution acting on smooth functions of two variables, by the formula $$ \langle T,\Phi\rangle = \int \Phi(z,z)\,\mathrm d z. $$ One can also equivalently see this as a distribution in one variable with value to smooth functions (that is, $\delta(x-y) = \delta_y(x)$ is for example a distribution in the $x$ variable and $y$ is a parameter), which gives $$ \langle\delta_y,\varphi\rangle = \varphi(y), $$ and so again $$ \langle T,\Phi\rangle = \int \langle\delta_y,\Phi(y,\cdot)\rangle\, \mathrm d y = \int \Phi(y,y)\,\mathrm d y. $$ All this works as well when $\Phi$ is just continuous and compactly supported, as $\delta$ is a measure. When taking derivatives, one will however need more regular test functions. By definition of the distributional derivative, $\partial_yT$ is defined as $$ \langle \partial_yT,\Phi\rangle = -\langle T,\partial_y\Phi\rangle. $$ By the previous formulas, this gives $$ \langle \partial_yT,\Phi\rangle = -\int(\partial_y\Phi)(z,z)\,\mathrm d z. $$ But since $\partial_z(\Phi(z,z)) = (\partial_y\Phi)(z,z) + (\partial_x\Phi)(z,z)$, assuming $\Phi$ is $C^1$ and compactly supported yields $$ \langle \partial_yT,\Phi\rangle = \int(\partial_x\Phi)(z,z)\,\mathrm d z, $$ which is what you wanted to prove. More precisely, if $\Phi = f\otimes g$, that is $\Phi(x,y) = f(x)\,g(y)$, then $$ \langle \partial_yT,f\otimes g\rangle = \int f'(z) \,g(z)\,\mathrm d z. $$