I have some questions about the Metropolis Hastings algorithm:
Wikipedia says:
...choose an arbitrary probability density g(x|y) which suggests a candidate for the next sample value x, given the previous sample value y....A usual choice is to let g(x|y) be a Gaussian distribution centered at y,
I tried using this (in R):
xdash=x+rnorm(1,mean=0,sd=0.1)
where x, xdash are the latest & proposed point respectively.
Suppose my function f(x) which I'm trying to sample is defined only on 0<= x <= 1 what do I do at the edges when the proposed point is say outside the domain? Do I just reject & draw another point from the gaussian?
https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm
Reason I ask is I'm getting strange sampling at the edges.
f(x) was defined so:
fn<-function(x){
if (x<=0.4 && x>=0){
return ( 5 )
} else if (x>0.4 && x<=0.6) {
return (10)
} else if (x>0.6 && x<=1) {
return(5)
}
else {
return (0)
}
}
Here's my code:
x= runif(1,min=0,max=1)
max_iter=10000
sd=0.1;
accepts<-c()
for(i in 1:max_iter){
xdash=x+rnorm(1,mean=0,sd=sd)
if(fn(x)==0) {
alpha<-0
} else {
alpha <- fn(xdash)/fn(x)
}
if (runif(n=1,min=0,max=1)<alpha) {
#accept
accepts<-c(accepts,xdash)
x<-xdash
}
}
Edited Code:
x= runif(1,min=0,max=1)
max_iter=10000; sd=0.1;
samples<-c()
for(i in 1:max_iter){
xdash=x+rnorm(1,mean=0,sd=sd)
alpha <- fn(xdash)/fn(x)
if (runif(n=1,min=0,max=1)<alpha) {
x<-xdash
}
samples<-c(samples,x)
}


It depends on what you mean by "reject & draw another point from the Gaussian".
Just extend $f$ to all of $\mathbb R$, with $f(x)=0$ outside $[0,1]$, and sample from that density. In the code you posted, you already did that. Then apply the standard algorithm. If by "reject & draw another point from the Gaussian" you mean that you immediately try again without counting the point at which you stayed because of the rejection as another sample, that would violate detailed balance and would lead to underrepresentation of the points near the boundary. That's not what the standard algorithm prescribes for the function defined in your code.
Your second question is unrelated and hard to answer independent of circumstances; I'd suggest to post it as a separate question, perhaps with some background on what sort of scenario you need this for (assuming that the simple example you provided here isn't the distribution you're actually intending to simulate in practice).