To represent formally the part of an algorithm
if $r \in [0,p (\mathbf{v})]$, set $y = A(\mathbf{v})$; else set $y = B(\mathbf{v})$,
where $\mathbf{v}$ is a vector of parameters and $r$ is a uniform random variable $ r \sim \mathrm{U}(0,1)$, it is possible to use a formalism with Heaviside step functions $\Theta$:
$$ 1 = \int \mathrm{d}y \; \int_0^1 \mathrm{d}r \; \Big[ \Theta\left[ p(\mathbf{v}) - r \right] \delta(y - A(\mathbf{v})) + \Theta\left[ r - p(\mathbf{v}) \right] \delta(y - B(\mathbf{v})) \Big] $$
I am trying to take the partial derivative of this expression with respect to $v_i$.
The derivatives from the theta-functions are clear enough, but I'm having trouble with the delta functions. On general grounds I think it should generate terms proportional to $\partial A / \partial v_i $ and $\partial B / \partial v_i $, but if I apply the usual result for the derivative of a Dirac delta, I don't get any. In addition to this, the usual derivation is related to integration by parts, but here there is no $\mathrm{d} v_i$ integral which makes me suspicious I'm missing something.
I understand that distributions can be tricky, so expect there's a subtlety I'm missing here. If anyone can help, I'd appreciate it if you could clarify (if you know!):
- Is the idea here conceptually problematic?
- If not, is there a way to make this work?
- Has anything similar to this been written up anywhere?