Why does this MCMC algorithm to estimate parameters of a linear equation not converge to the posterior distribution?

158 Views Asked by At

As a kind of proof of principle I'm trying to estimate the parameters of a linear equation (before moving on to ODEs) using Markov Chain Monte Carlo sampling. The post that I am following can be found at this link https://sciencehouse.wordpress.com/2010/06/23/mcmc-and-fitting-models-to-data/

Essentially what I am doing is not working and I'm wondering if anybody can spot the reason why?

The model $$y=mx+c$$ evaluated at $m=2$ and $c=4$ as a simple linear model:

The Algorithm

  1. Initialize $m0$ and $c0$ to be random values (I arbitrarily chose 25 for both) and evaluate a function that measures the descrepancy between the experimental and simulated data (where of course in this case the experpimental data is also simulated with the correct parameters). I use the sum of squares $SSR$:

$$ SSR=\frac{(SimulatedData-ExperimentalData)^2}{2*sigma^2}$$

Because there is no experimental error in this model, sigma was arbitrarily chosen to be 0.0001 in all cases.

  1. I then begin a loop and sample from prior $π_m$ and $π_c$, producing $m_prop$ and $c_prop$ respectively, which are both uniformly distributed between 0 and 100.
  2. Calculate the $SSR$ with the sampled parameters
  3. Calculate the likelihood ratio $LR$: $$ LR=exp[\frac{(-ln(SSR proposed)+ln(SSR current)}{2*sigma^2}]$$
  4. Randomly accept the new parameters by sampling again from the uniform distribution between 0 and 1 (call this $r$). If $LR>r$ the new parameters are accepted and $m_prop$ becomes $m0$ and $c_prop$ becomes $c0$

As a side question, why do some people say the acceptance probability is $min(1,LR)$ whereas the post that I'm following uses random acceptance?

  1. Iterate until convergence

Results

enter image description here enter image description here

---------edit--------------

Courtesy of Joriki's answer, I have removed the $2*sigma^2$ term from the likelihood function such that step 4 now reads:

  1. Calculate the likelihood ratio $LR$: $$ LR=exp[(-ln(SSR proposed)+ln(SSR current)]$$

This has considerably improved the results: enter image description here

enter image description here

Matlab Code

For completeness and to give the details of the algorithm, here is the code I'm using to generate the plots:

interval=1:100;
y=mx_plus_c((2),interval,(4));
n=1e5; %iterations
results=zeros(n,8);
m0=25;
c0=25;
y_sim=mx_plus_c(m0,interval,c0);
SSR_cur=SSR(y,y_sim,0.0001);
mlb=log(1);   %lb lower bound, ub upper bound respoectively 
mub=log(100);
clb=log(1);
cub=log(100);
for i=1:n,
    m_prop   =   (mlb+(mub-mlb).*rand(1,1)   )   ;
    c_prop   =    (clb+(cub-clb).*rand(1,1)   )  ;
    SSR_prop=SSR(y,mx_plus_c(m_prop,interval,c_prop),0.0001);
    likelihood_ratio=exp((-log(SSR_prop)+log(SSR_cur)));
    r   =    ( (0+(1-0).*rand(1,1)   )  );
    if likelihood_ratio>r,
        m0=m_prop;
        c0=c_prop;
        SSR_cur=SSR_prop;
    end
    results(i,1)=m0;
    results(i,2)=c0;
    results(i,3)=SSR_cur;
    results(i,4)=m_prop;
    results(i,5)=c_prop;
    results(i,6)=SSR_prop;
    results(i,7)=likelihood_ratio;
    results(i,8)=r;

end
% results
figure(2)
hist(results(:,1),1000)
title('Distribution of m parameters','fontsize',16) 
xlabel('Parameter Values','fontsize',16)
ylabel('Frequency','fontsize',16)

figure(3)
hist(results(:,2),1000)
title('Distribution of c parameters','fontsize',16) 
xlabel('Parameter Values','fontsize',16)
ylabel('Frequency','fontsize',16)

Where: mx_plus_c is a Matlab function:

function y=mx_plus_c(m,x,c)
y=m*[x]+c;
end

And SSR is also a Matlab function

function chi2 = SSR(sim,exp,sigma)
chi2 =sum(  (  ([sim]-[exp]).^2)./ ( 2*sigma^2) ) ;
1

There are 1 best solutions below

4
On BEST ANSWER

There are several problems – without seeing the details of the simulation, it's hard to tell which, if any, of them are causing the failure.

  • You're choosing the parameters anew every time – you're not making use of the previous progress in improving the parameters. The idea is to propose new parameters in the vicinity of the present parameters, not sample them from the same distribution every time. (What exactly the "vicinity" is depends on the specific application, and you can change it over time to take big steps at first and then increasingly small steps.)
  • I don't think you should be taking the logarithm and dividing by $2\sigma^2$ again in computing the likelyhood ratio – you should just be exponentiating the difference between the squared errors. Since your $\sigma$ is tiny, this may be causing problems.
  • You don't say how you determine convergence. With that tiny value of $\sigma$ and that broad prior that you never change, if you terminate after a certain number of failed attempts to change the parameters, this may just be leaving you in some random place because you're unlikely to make progress in each step.

Generally, to debug this sort of thing, you should look at the values as they evolve and try to check at each stage whether what is happening corresponds to what you expected to happen, instead of just making one big plot in the end that doesn't show what went wrong.

As to your side question, "why do some people say the acceptance probability is $\min(1,LR)$ whereas the post that I'm following uses random acceptance?", I don't understand the question – using an acceptance probability of $\min(1,LR)$ is random acceptance (namely with a certain probability), and it's exactly what you're doing: If $LR\gt1$, you always accept, and otherwise you accept with probability $LR$.