Is having a burn-in time relevant when only trying to sample from a distribution?

26 Views Asked by At

I'm trying to simulate - via the Metropolis-Hastings algorithm - a sample $X$ of size 10000 from a density $f$ using a proposal distribution $g$.
The Markov chain $X$ obtained by this algorithm has the stationary distribution $f$, i.e: for every starting point $x, y\in M$ we have :
$$P_x(X_n = y) → f(y) \text{ as } n→\infty.$$
A classical step after generating my sample X is to discard the first thousand values or so, so I only have $X_n$ with $n$ big enough such that $X_n$ approximately follows $f$.
However, after some reading (here and here), I am under the impression that this is unnecessary if we start from a state $x_0\in M$ that should be reached with high probability.
While I think I get the point these texts are trying to make, starting at a large $n$ seems absolutely necessary to me so that $X$ starting from $n$ follows $f$.
So, should I skip a thousand values and only consider my chain from then on, or should I inspect the output values and start from the mode of $f$ ?