I'm getting confused about how to correctly state the definition of Markov chain verbally.
The formal definition of Markov chain is that $P(X_n|X_{n-1}, ..., X_1) = P(X_n|X_{n-1})$
According to Wikipedia, the Markovity is stated as "the probability of moving to the next state depends only on the present state and not on the previous states". I think this representation is not completely right. Since the term "not on the previous states" can be understood as $P(X_n|X_i) = P(X_n)$ $\forall i < n-1$ (i.e. $X_n$ doesn't depend on states earlier than $X_{n-1}$), which isn't given.
I have several verbal statements about this formal definition:
$X_{n-1}$ contains all information needed for the process to move on to $X_n$. This means $X_{n-2}, ...., X_1$ do affect $X_n$ AND their effects on $X_n$ is summarized by $X_{n-1}$.
$X_n$ is independent of $X_1, ..., X_{n-2}$ given $X_{n-1}$. This means $X_{n-2}, ..., X_1$ means that once $X_{n-1}$ is given, $X_{n-2}, ..., X_1$ have no effect on $X_n$.
Which one should be better for representing the definition of Markov chain ?
Your first formal definition is not entirely correct. If $X_{n-1}$ is unknown, $X_{n-2},\ldots,X_1$ does affect our knowledge of $X_n$ because knowledge of $X_{n-2}$ influences the possible values for $X_{n-1}$ and therefor the possible values for $X_n$. The fact that $X_{n-1}$ is known is really important.
Also, what Markov chain says is not that $X_{n-1}$ contains all the information to predict $X_n$, it says that once $X_{n-1}$ is known, the knowledge of $X_{n-2},\ldots,X_1$ is superficial when it comes to determining $X_n$ from the previous values of the chain. In a sense, we can "forget" the steps prior to $n-1$ if all that we want is determining the value of $X_n$ from the previous values of the chain. The goal is really important here. We want to determine $X_n$ not in absolute but given the previous values of the chain.
Your second proposition is more correct but it is important to see that the term "has no effect" is in a sense wrong. It is not that it has no effect in absolute, because as I already said, $X_{n-1}$ depends on $X_{n-2}$ which itself help determine $X_n$. The correct intuition to have is that when trying to determine $X_n$ from the previous values of the chain, the knowledge of $X_{n-1}, X_{n-2},\ldots,X_1$ is redundant with the knowledge of $X_{n-1}$.
The description of @Did in the comment section of this question is really what happens, but if I were to rephrase it more simply I would go with