The primary difference between an AR and MA model is based on the correlation between time series objects at different time points. The correlation between $x(t)$ and $x(t-n)$ for $n > \text{order of MA}$ is always zero. This directly flows from the fact that covariance between $x(t)$ and $x(t-n)$ is zero for MA models (something which we refer from the example taken in the previous section). However, the correlation of $x(t)$ and $x(t-n)$ gradually declines with $n$ becoming larger in the AR model. This difference gets exploited irrespective of having the AR model or MA model.
Source: https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/
I just cannot see how to derive the covariance between $x(t)$ and $x(t-n)$ for MA models.
Moreover, I cannot see how the correlation of $x(t)$ and $x(t-n)$ would declines as $n$ increases in the AR model.
For simplicity, take an $\text{AR}(1)$ and a $\text{MA}(1)$ model, written below (in order):
$$X_t = \frac{1}{2}X_{t - 1} + \epsilon_t$$ $$W_t = \epsilon_t + 0.5\epsilon_{t - 1}$$
For the $\text{AR}(1)$ process we have the representation $X_t = \left(\frac{1}{2}\right)^h X_{t - h} + \sum_{j = 0}^{h - 1} \left(\frac{1}{2}\right)^j \epsilon_{t - j}$ (if you are uncomfortable with this, recursively replace $X_{t - j}$ terms in the above definition of the $\text{AR}(1)$ process until you eventually have $\epsilon_{t - n}$ in the expression). From this let's compute $E[X_{t + n}X_t]$ and $E[W_{t + n}W_t]$.
Remember that the only thing that is random in these representations is the process $\{\epsilon_t\}_{t \in \mathbb{Z}}$, and $E[\epsilon_t^2] = 1$, and all are iid. From this we get:
$$E[W_{t + n} W_t] = E[(\epsilon_{t + n} + 0.5 \epsilon_{t + n - 1})(\epsilon_t + 0.5\epsilon_{t - 1}) \\ = E[\epsilon_{t + n}\epsilon_t] + 0.5E[\epsilon_{t + n}\epsilon_{t - 1}] + 0.5E[\epsilon_{t + n - 1}\epsilon_t] + 0.25E[\epsilon_{t + n - 1}\epsilon_{t - 1}] \\ = 1.25\delta_{0}(n) + 0.5 \left(\delta_{-1}(n) + \delta_{1}(n)\right)$$
($\delta_{j}(n)$ is 1 if $n = j$ and 0 otherwise.) From this you see that for large $n$ the covariance, and thus the correlation, is zero between $W_t$ terms.
Now let's find the covariance function for $X_t$.
$$E[X_{t + n}X_t] = E\left[\left(\left(\frac{1}{2}\right)^n X_{t} + \sum_{j = 0}^{n - 1} \left(\frac{1}{2}\right)^j \epsilon_{t + n - j}\right)X_t\right] \\ = \left(\frac{1}{2}\right)^nE[X_t^2] + \sum_{j = 0}^{n - 1}\left(\frac{1}{2}\right)^j E\left[\epsilon_{t + n - j} X_t\right]$$
If you are willing to believe that $E[X_t^2] = \frac{4}{3}$ and $E\left[\epsilon_{t + n - j} X_t\right] = 0$ (in this situation; basically, the two random variables are independent when $j \neq n$) then the covariance is $\left(\frac{1}{2}\right)^n \frac{4}{3}$, a quantity that goes to zero with large $n$. (The last two claims can be justified.)
What I described works for more general AR and MA models, but those models require more work; this gives you the gist of what is going on.