Size of the vocabulary in Laplace smoothing for a trigram language model

5.8k Views Asked by Bumbble Comm At 05 Apr 2026 - 6:40

Let's say we have a text document with $N$ unique words making up a vocabulary $V$, $|V| = N$. For a bigram language model with add-one smoothing, we define a conditional probability of any word $w_{i}$ given the preceeding word $w_{i-1}$ as: $$P(w_{i}|w_{i-1}) = \frac{count(w_{i-1}w_{i}) + 1}{count(w_{i-1}) + |V|}$$ As far as I understand (or not) the conditional probability, and basing on a 3rd point of this Wikipedia article, $w_{i-1}$ might be assumed to be "constant" here, so by summing this expression for all possible $w_{i}$ we should obtain 1, and so it is, which is obvious.

However, I do not understand the answers given for this question saying that for n-gram model the size of the vocabulary should be the count of the unique (n-1)-grams occuring in a document, for example, given a 3-gram model (let $V_{2}$ be the dictionary of bigrams): $$P(w_{i}|w_{i-2}w_{i-1}) = \frac{count(w_{i-2}w_{i-1}w_{i}) + 1}{count(w_{i-2}w_{i-1}) + |V_{2}|}$$ It just doesn't add up to 1 when we try to sum it for every possible $w_{i}$. Therefore - should the $|V|$ really be equal to the count of unique (n-1)-grams given an n-gram language model or should it be the count of unique unigrams?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 07 Aug 2021 - 3:46

$$P_{\text{Laplace}}^*(w_{i}|w_{i-2}w_{i-1}) = \frac{count(w_{i-2}w_{i-1}w_{i}) + 1}{\sum_w (count(w_{i-2}w_{i-1}w)+1)}=\frac{count(w_{i-2}w_{i-1}w_{i}) + 1}{count(w_{i-2}w_{i-1})+|V|}$$

Size of the vocabulary in Laplace smoothing for a trigram language model

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions