Let's say I've been given a sentence $\mathcal{S}$of $n$ words. I have a vocabulary $\mathcal M $ of $m$ words. If I sample $n$ words by picking at random from $\mathcal {M} $ successively what's the expected number of places where the generated words agree with those from $\mathcal{S}$?
- If I sample with replacement
- If I sample without replacement
I've come up with the following expressions. Wanted to check if they are correct.
\begin{gather} \sum_{\ell=1}^n \ell \binom n\ell \left(\frac 1m\right)^\ell \left( \frac{m-1}m \right)^{n-\ell} \tag{with replacement} \\ \sum_{\ell=1}^n \ell \binom n\ell \prod_{j=0}^{\ell-1} \frac{1}{m-j} \prod_{i=0}^{n-\ell-1} \frac{-1+m-\ell-i}{m-\ell-i} \tag{without replacement} \end{gather}