In Durrett's textbook, Exercise 2.2.2 states that
My two questions are
Q1: do we need to assume $EX_n = 0?$
Q2: Is there any non-independent example to illustrate this generalization?
Now I skectch my proof:
I show it converges in $L^2$ by expanding $E(\frac{X_1 + \cdots X_n}{n})^2$ and then observing that it's smaller than
$\frac{1}{n^2} \sum_{|i-j|\text{is not large}} r(|i-j|)$ + $\frac{1}{n^2} \sum_{|i-j|\text{is large}} r(|i-j|).$
Then the first finite term is controlled by $\frac{1}{n^2}$ and the second term is controlled by the decay of $r(k).$
Since every random variable in the above computations is quadratic (i.e. $X_i X_j$), I don't think we need to assume $EX_n = 0?$ Does it play a role on the existence of the function $r(k)?$

I have a satisfactory explanation to Question 1 now, but nothing for Question 2 yet:
I go back to consider any independent sequence of random variables $\{ X_n \}_n$ and find that $EX_n X_m = EX_n EX_m.$ I think this gives a reason that why the exercise requires $EX_n = 0$ so that the function $r(k)$ may exist naturally.
Actually, if $\{ X_n \}$ is non-independent with non-zero means, just consider $Y_n = X_n - EX_n$ and require the condition $EY_nY_m \leq r(n-m).$