An example of this is with regard to the variation of the Lagrangian density $\mathcal{L}(\phi(x^{\mu}),\partial_{\mu}\phi)$:
$$ \delta\mathcal{L}=\frac{\partial{\mathcal L}}{\partial\phi}\delta\phi+\frac{\partial\mathcal{L}}{\partial(\partial_{\mu}\phi)}\delta(\partial_{\mu}\phi). $$
My question is when and why is it appropriate to say that $\delta(\partial_{\mu}\phi)=\partial_{\mu}(\delta\phi)$ and is there a proof to show this?
In somewhat sloppy notation, $$ \delta F[\phi]=\frac{d}{d\epsilon}\bigg|_{\epsilon=0}F[\phi+\epsilon\delta\phi]. $$ In particular, if $F[\phi]=\partial_\mu\phi$, then $$ \delta (\partial_\mu\phi)=\frac{d}{d\epsilon}\bigg|_{\epsilon=0}\partial_\mu(\phi+\epsilon\delta\phi)=\frac{d}{d\epsilon}\bigg|_{\epsilon=0}(\partial_\mu\phi+\epsilon\partial_\mu(\delta\phi))=\partial_\mu(\delta\phi). $$ More generally, if $F$ is linear, then $\delta F[\phi]=F[\delta\phi]$.