How to use Metropolis-Hastings algorithm for discrete case

248 Views Asked by At

I came across this question that is quite confusing for me. So far, I've dealt or seen many examples of MH and Gibbs sampling for continuous case and haven't really thought about them for discrete case.


Q : Estimate the marginal distribution of X with a Gibbs sampler.

$X|n,y\sim Bim(n,y)\\y\sim Beta(2,4)\,\,\, , \,\,n\sim Poi(16)$

(y and n are independent)


I derived joint posterior distribution and each full conditional distribution for each parameter.(I'm not sure about them. There might be some mistakes)

(0)$P(X,n,y)\propto P(X|n,y)P(n)P(y)\propto \Large\binom{n}{x}\large y^x(1-y)^{n-x}*y(1-y)^3*\Large\frac{16^n}{n! }$

(1)$X|n,y\sim Bim(n,y)$

(2)$y|X,n\propto \large y^{x+2-1}(1-y)^{n-x+4-1}\sim Beta(x+2, n-x+4)$

(3)$n|X,y\propto \Large\binom{n}{x}\large(1-y)^{n-x}\Large\frac{16^n}{n!} \sim ?$

I don't think I can tell which distributions (3) is. Since I can't think of any distributions, I'm gonna use Metropolis-Hastings algorithm. Then I need proposal distribution which has at least the same support of n.(Since 'n' comes from Poisson distribution, it can take 0,1,2,...)

So my question is...

(1)What kind of proposal distribution should I use? (Binomial? Poisson? or some discrete distributions?)

(2)Does discrete case have the same MH-algorithm steps with continuous case? I mean calculating the ratio, comparing with 1 and deciding whether to accept or not.

Any help would be really appreciated:)

2

There are 2 best solutions below

3
On

You can sample $y$ and $n$ since you know their distributions and then sample $X$ based on those $y$ and $n$. That would give you enough to visualise the marginal distribution of $X$. For example with R

set.seed(2022)
cases <- 10^5
y <- rbeta(cases,2,4)
n <- rpois(cases,16)
X <- rbinom(cases,n,y)

plot(table(X)/cases)

enter image description here

This is plausible: that simulation gives mean(X) of 5.34868 while $\mathbb E[X] = \frac{2}{2+4} \times 16 \approx 5.333$.

4
On

Since the prompt says use a Gibbs sampler, you can just use Gibbs sampling.

AFAIK the conditional (3) looks correct. Also notice that conditional on $X$, $n$ can be $X, X+1, X+2, \dots$. Eventually, the probabilities become negligible for large values of $X$, unless $y$ is also extreme. Therefore, you can implement say $125$ values of $n$ (or $1250$ if this doesn't give the grassy oscillation in the time series plots). That is, pretend that $p(n|X,y)$ has support $X, X+1, \dots, X+124$. To find the normalizing constant (denominator), divide each $p(n|X,y)$ by the sum of all the $\sum_{i=X}^{X+124}p(n_i|X,y)$. This defines a discrete distribution over 125 numbers. Gibbs can be used to sample from this.