Consider the following Markov Chain: $$ \theta' = \begin{cases} \theta'' \sim q(\theta'' | \theta) & \text{with probability} \quad f(\theta, \theta'') \\ \theta & \text{with probability} \quad 1 - f(\theta, \theta'') \end{cases} $$
where $\theta$ is the previous generated value.
So, to generate a next element $\theta'$ given the previous value $\theta$ it does the following:
- Generates $\theta''$ from known distribution $q(\theta'' | \theta)$
- Takes $\theta' := \theta''$ with probability $f(\theta, \theta'')$
- Or assigns it to the old value $\theta':=\theta$ with probability $1 - f(\theta, \theta'')$
The question is to check that some distribution $\varphi(\theta)$ is an invariant distribution is this markov chain.
In solution they firstly found the joint distribution:
$$ q(\theta', \theta''|\theta) = \delta(\theta - \theta')[1 - f(\theta, \theta'')]q(\theta'' | \theta) + \delta(\theta'' - \theta') f(\theta, \theta'')q(\theta'' | \theta) $$
and then marginalize it w.r.t. $\theta''$ to get $q(\theta' | \theta)$.
I don't really get how any of $\delta$-s appeared there. I know $\delta(x)$ is a generalized function, so it makes sense only inside the integral.
Any ideas?
First note that $\theta'$ is the random quantity and $\theta,\theta''$ are "given" in the expression $\delta(\theta-\theta')[1-f(\theta, \theta'')]+\delta(\theta''-\theta')f(\theta, \theta'')$. This expression specifies the distribution of the next state given the previous value and proposed value. That is, that quantity equals $q(\theta'|\theta,\theta'')$. Then, by the rule of conditional distributions, we get $q(\theta',\theta''|\theta)=q(\theta'|\theta,\theta'')q(\theta''|\theta)$, which is a true statement.
According to the wiki page, the dirac delta is a function whose "value is zero everywhere except at zero, and whose integral over the entire real line is equal to one." Therefore, this places a point mass with density $1-f(\theta,\theta'')$ at $\theta$, the old value, and another with density $f(\theta,\theta'')$ at $\theta''$, the proposed value, in the discrete distribution of $\theta'$.
I apologize that I cannot answer in a more rigorous manner.
Edit: see these two places:
The second link says that that is how to represent the discrete distribution and the first link explains that it is because the integral evaluates to 1.
According to the “sifting property” $\int_{-\infty}^\infty f(x)\delta(x-x_0)dx=f(x_0)$ which would imply the correct way to think about it is to focus on the area of the integral rather than the height being infinite.