I am using Metropolis-Hastings to generate samples from what I assume to be the posterior. At some point, the samples will converge - I cannot see how they would not since they all share the proposal distribution which must have some sort of mean.
Now I hear a lot of people talking about that you want to minimize autocorrelation for the samples. This does not make sense to me when you take convergence into consideration; You usually discard the initial samples up until every parameter has converged, right. Say I am running multiple chains and the samples all converge to the same points (again, I cannot see how this cannot happen since distributions have means) and so this value would be the ideal sample because then you could perfectly approximate things like intergrals, am I right?`
So why dont you just go for convergence and try to move towards a given value for your parameters (at this points after the burn-in phase, the autocorrelation would of course be high since the sample are all the same but why is this bad?)
The trajectories themselves generally aren't convergent, only the empirical distribution of a trajectory is convergent (under ergodicity assumptions which in the setting of M-H are extremely mild).
That being said, it is true that even arbitrarily strong autocorrelation will not destroy convergence, but it will reduce the convergence rate. For some intuition, consider a 2 state chain with a matrix like
$$\begin{bmatrix} 1-10^{-6} & 10^{-6} \\ 10^{-6} & 1-10^{-6} \end{bmatrix}.$$
One can see by inspection that $\begin{bmatrix} 1/2 & 1/2 \end{bmatrix}$ is the invariant distribution. But what is the convergence rate to that when starting from state 1? You will spend somewhere around a million time steps at state 1, then jump to state 2, then stay there for around a million time steps, etc. It typically takes a very long time for the empirical distribution to even stop being $\begin{bmatrix} 1 & 0 \end{bmatrix}$, and then when a jump to state 2 finally happens, change will be very slow (since you accumulated so many time steps in state 1 already). The problem is that the probability that the next state in the trajectory will be the same as the current one is very high, in other words the problem is in the autocorrelation.
Contrast this with an extreme case like $\begin{bmatrix} 1/2 & 1/2 \\ 1/2 & 1/2 \end{bmatrix}$ where actually you're drawing from the target distribution right from the start.