Given a sequence of words $w_{1:n}$ (short for $(w_1,w_2,\ldots,w_n)$), we use the chain rule to compute its probability as follows:
$$ P(w_{1:n}) = P(w_1) \prod_{i=2}^n P(w_i|w_{1:i-1}) $$
Suppose the words are confined to a vocabulary V, i.e., $w \in V$ and that we don't have any information about the distribution of the words. What can we say about the probability of an arbitrary sequence of length n?
I came up with two answers, but they don't agree. I suspect (Attempt A) is wrong, but I don't know what step is mistaken. I'd be grateful for any help with this.
Attempt A
We compute the "unigram" term $P(w)$ as follows:
$$ P(w) = \frac{1}{|V|} $$
Because we don't know anything about particular words, we assume they're uniformly distributed. In a similar fashion we compute the "bigram" term $P(w_i|w_{i-1})$ as follows:
$$ P(w_i|w_{i-1}) = \frac{1}{|V|^2} $$
Since there are $|V|^2$ bigrams, the probability is again the ratio of 1 over the entire space.
Carrying this out for all indices and plugging into the chain rule formula we get:
$$ \begin{align} P(w_{1:n}) &= \prod_{i=1}^n \frac{1}{|V|^i}\\ P(w_{1:n}) &= \frac{1}{|V|^1} \cdot \frac{1}{|V|^2} \cdot \cdots \cdot \frac{1}{|V|^n}\\ P(w_{1:n}) &= \frac{1}{|V|^{1 + 2 + \ldots + n}}\\ P(w_{1:n}) &= \frac{1}{|V|^{\frac{n(n+1)}{2}}}\\ \end{align} $$
Attempt B
This second attempt is the one usually found in the literature (the context is that of language modeling). Again we have no information about the words and assume a uniform distribution. The key here is that when computing the probability of a word $w_i$ given its history $w_{1:i-1}$, we make the following conditional independence assumption:
$$ P(w_i|w_{1:i-1}) = P(w_i) $$
Using the chain rule we then describe the joint probability of the entire sequence by:
$$ P(w_{1:n}) = \prod_{i=1}^n P(w_i) $$
Using the uniformity assumption we get:
$$ \begin{align} P(w_{1:n}) &= \prod_{i=1}^n \frac{1}{|V|}\\ &= \frac{1}{|V|^n} \end{align} $$
In theory I think the two descriptions you give are the same, and that perhaps it is your understanding of conditional probability that is causing the confusion.
In your scenario a), whilst you consider pairs $P(w_i \,| \, w_{i-1})$ as "bigrams", i.e. pairs, once you condition on the specific choice of $w_{i-1}$ the only `randomness' left in the system is the value of$w_i$, which (assuming independence) can take one of $|V|^{-1}$ values.
That is, assuming independence of $w_i,\,w_{i-1}$ and uniform distributions then
$$\mathbf P(w_i\,|\,w_{i-1}) = \frac{1}{|V|}.$$
The important heuristic here is: once you have conditioned on the value of $w_{i-1}$ it is no longer random; so the extra $|V|^{-1}$ factor does not come into play.