I know that a Markov Chain is a discrete random process where the current state decides the next and in a random walk, the probability that we move from node u to v is 1/N(u). An MCMC sample will converge to a stationary distribution assuming aperiodicity and irreducibility - and this means that the probability of moving from node u to v is now derived from a stationary distribution and independent of the degree of node u - am I correct?
Also how is this distribution derived? And what exactly is a stationary distribution?
TIA, Craig
You should read Understanding the Metopolis-Hastings Algorithm which is one version of an MCMC algorithm. In essence, the idea behind MCMC sampling is as follows:
You wish to draw samples from a pdf, $f(\theta_1,\theta_2)$. The problem though is that $f(.)$ is either a non-standard density or perhaps numerical integration is computationally expensive because of the complexity of $f(.)$. The goal, of course, is to compute the moments of $\theta_1$ and $\theta_2$ (e.g., the mean and the variance etc).
In such a scenario, an MCMC sampler (e.g., the Gibbs sampler, the Metroplis-Hastings etc) constructs a markov chain with the following two properties:
The markov chain is constructed such that it is a reversible.
The markov chain's stationary distribution is $f(\theta_1,\theta_2)$ by construction.
Therefore, MCMC algorithms converge to the target density by construction (at least in theory).
From an implementation perspective- we start from an initial set of values for $\theta_1, \theta_2$ and repeatedly sample $\theta_1, \theta_2$ using the MCMC sampler (See the details about the Gibbs algorithm's implementation at Wikipedia for one way to implement an MCMC sampler). In theory, the markov chain will eventually converge to the target distribution $f(.)$ by construction.
However, there are practical difficulties as we often do not know how long to iterate the markov chain to ensure convergence. Thus, checking for the convergence of MCMC chains is very difficult and we often use heuristics (e.g., start the sampler from different starting points and monitor whether the trace plots of the draws eventually coincide etc).