About formula of probabilities of entire sequences

32 Views Asked by At

In N-Grams section of book "Speech and Language Processing." written by Daniel Jurafsky & James H. Martin found on the Internet, it's said that firstly they represent a sequence of N words either as $w_{1}...w_{n}$ or $w_{1}^{n}$. Then in computing probabilities of entire sequences like $P(X_{1},X_{2},...,X_{n})$, the formula will be:

$\begin{align}P(X_{1}...X_{n}) &= P(X_{1})P(X_{2}|X_{1})P(X_{3}|X_{1}^{2})...P(X_{n}|X_{1}^{n-1})\\&=\prod_{k=1}^{n} P(X_{k}|X_{1}^{k-1})\end{align}$

if I want to calculate $P(X_{1},X_{2})$, I know the result will be $P(X_{2}|X_{1})P(X_{1})$,

but according to the formula, I will get $P(X_{1}|X_{1}^{0})P(X_{2}|X_{1}^{1})$,

I want to know the calculation process of $X_{1}^{0}$ and $X_{1}^{1}$, which means why $X_{1}^{0}=1 $ and $X_{1}^{1}=X_{1} $.

thanks.

1

There are 1 best solutions below

0
On

Its just an issue of notation.

Writing $X_a^b$ simply means "the sequence of $X$'s beginning from integer $a$ and ending with integer $b$".

So in your question, $X_1^0$ would be read as "the (increasing) sequence of $X$'s that starts from index $1$ and ends with index $0$". In other words, this sequence is an empty sequence since you cannot increase the integer 1 to obtain 0.

Similarly, $X_1^1$ would be read as "the (increasing) sequence of $X$'s that starts from index $1$ and ends with index $1$", which just consists of the random variable $X_1$.

Therefore, you have:

$$ P(X_1, X_2) = P(X_2|X_1^1) P(X_1|X_1^0) = P(X_2|X_1)P(X_1|\text{nothing})$$

And since conditioning on "nothing" is equivalent to the marginal distribution you have:

$$P(X_2|X_1)P(X_1|\text{nothing}) = P(X_2|X_1)P(X_1)$$