Consider the characteristic function of $[0, \infty)^2$ (i.e. a function $u: \mathbb{R^2} \to \mathbb{R}$ given by $u(x, y) = 1$ if $x \geq 0$, $y \geq 0$ and $u(x, y) = 0$ otherwise). I want to compute $\partial_x \partial_y u$, where the derivatives are distributional, to avoid blowup at jump discontinuities. Since $u$ projected onto the $y$-axis is just the Heaviside function, $\partial_y u(x, y) = \delta_0(y)$ if $x \geq 0$ and $\partial_y u(x, y) = 0$ otherwise.
I visualize this $\partial_y u$ as a map which sends the plane to itself, except on the ray $\{(x, 0) \in \mathbb{R}^2: x \geq 0\}$, in which case all points are "mapped" to $+\infty$. The trouble is, this gives me no intuition how to approach the computation of its second derivative $\partial_x$. My best guess is that I should expect $\partial_x \partial_y u$ to be identically $0$ outside of the origin, where it is $+\infty$; i.e. that $\partial_x \partial_y u = \delta_0$. But this also feels like "cheating" somehow: I would expect $\delta_0$ to only be the derivative of a finite multiple of the Heaviside function, not of a distribution which is already infinite!
Is this method of approaching differentiation of distributions appropriate? If I bashed out convolutions of $C_0^\infty$ functions I could probably (dis?)prove that $\partial_x \partial_y u = \delta_0$, but I'd still feel unsure why it's true/false, because I don't know how to imagine the partial derivatives of a distribution.
You have $u(x,y) = H(x) \otimes H(y)$ so $$\partial_x \partial_y u(x,y) = \partial_x H(x) \, \partial_y H(y) = \delta(x) \delta(y) = \delta(x,y).$$ (Here $x$ and $y$ should be viewed as placeholders rather than values.)