In N-Grams section of book "Speech and Language Processing." written by Daniel Jurafsky & James H. Martin found on the Internet, it's said that firstly they represent a sequence of N words either as $w_{1}...w_{n}$ or $w_{1}^{n}$. Then in computing probabilities of entire sequences like $P(X_{1},X_{2},...,X_{n})$, the formula will be:
$\begin{align}P(X_{1}...X_{n}) &= P(X_{1})P(X_{2}|X_{1})P(X_{3}|X_{1}^{2})...P(X_{n}|X_{1}^{n-1})\\&=\prod_{k=1}^{n} P(X_{k}|X_{1}^{k-1})\end{align}$
if I want to calculate $P(X_{1},X_{2})$, I know the result will be $P(X_{2}|X_{1})P(X_{1})$,
but according to the formula, I will get $P(X_{1}|X_{1}^{0})P(X_{2}|X_{1}^{1})$,
I want to know the calculation process of $X_{1}^{0}$ and $X_{1}^{1}$, which means why $X_{1}^{0}=1 $ and $X_{1}^{1}=X_{1} $.
thanks.
Its just an issue of notation.
Writing $X_a^b$ simply means "the sequence of $X$'s beginning from integer $a$ and ending with integer $b$".
So in your question, $X_1^0$ would be read as "the (increasing) sequence of $X$'s that starts from index $1$ and ends with index $0$". In other words, this sequence is an empty sequence since you cannot increase the integer 1 to obtain 0.
Similarly, $X_1^1$ would be read as "the (increasing) sequence of $X$'s that starts from index $1$ and ends with index $1$", which just consists of the random variable $X_1$.
Therefore, you have:
$$ P(X_1, X_2) = P(X_2|X_1^1) P(X_1|X_1^0) = P(X_2|X_1)P(X_1|\text{nothing})$$
And since conditioning on "nothing" is equivalent to the marginal distribution you have:
$$P(X_2|X_1)P(X_1|\text{nothing}) = P(X_2|X_1)P(X_1)$$