Let's say we have a text document with $N$ unique words making up a vocabulary $V$, $|V| = N$. For a bigram language model with add-one smoothing, we define a conditional probability of any word $w_{i}$ given the preceeding word $w_{i-1}$ as: $$P(w_{i}|w_{i-1}) = \frac{count(w_{i-1}w_{i}) + 1}{count(w_{i-1}) + |V|}$$ As far as I understand (or not) the conditional probability, and basing on a 3rd point of this Wikipedia article, $w_{i-1}$ might be assumed to be "constant" here, so by summing this expression for all possible $w_{i}$ we should obtain 1, and so it is, which is obvious.
However, I do not understand the answers given for this question saying that for n-gram model the size of the vocabulary should be the count of the unique (n-1)-grams occuring in a document, for example, given a 3-gram model (let $V_{2}$ be the dictionary of bigrams): $$P(w_{i}|w_{i-2}w_{i-1}) = \frac{count(w_{i-2}w_{i-1}w_{i}) + 1}{count(w_{i-2}w_{i-1}) + |V_{2}|}$$ It just doesn't add up to 1 when we try to sum it for every possible $w_{i}$. Therefore - should the $|V|$ really be equal to the count of unique (n-1)-grams given an n-gram language model or should it be the count of unique unigrams?
$$P_{\text{Laplace}}^*(w_{i}|w_{i-2}w_{i-1}) = \frac{count(w_{i-2}w_{i-1}w_{i}) + 1}{\sum_w (count(w_{i-2}w_{i-1}w)+1)}=\frac{count(w_{i-2}w_{i-1}w_{i}) + 1}{count(w_{i-2}w_{i-1})+|V|}$$