I have a situation where a game has drawn a word from a dictionary 3000 times (with replacement), and there are 1100 distinct words from those 3000 samples. I have a frequency list for the 1100 distinct words.
My problem is, I'm trying to find the expected size of the dictionary which these words came from, given that frequency list. I have read up on "Maximum Likelihood Estimation", but from this, to me it seems that the answer should always be the same number as the number of distinct words, and I'm not sure if that would be the "expected" size, but possibly the "most likely".
I've been considering an equivalent representation of this problem which is as follows, take a set $S_n=\{1,2,3,...,n\}$. Then, sampling from $S$ exactly $k$ times with replacement (with each value given an equal $1/n$ probability of being drawn), a list of drawn values is attained: $D = [a_1, a_2, ..., a_k]$ where there may be duplicates.
Find the expected value of $n$ given this list $D$.
Of course using the definition of expected value, then we know that $\mathbb{E}[n] = \sum_{i=1}^\infty i\cdot\mathbb{P}(S = \{1,2,... i\}|D)$. But the probabilities that I need to calculate seem to be in the "wrong" direction, that is, it is much simpler to calculate $\mathbb{P}(D|S_n)$, yet the individual probabilities $\mathbb{P}(D)$ and $\mathbb{P}(S_n)$ seem nonsensical on their own.
Expected value of $n$ only makes sense, if you adopt a Bayesian approach, whereby $n$ is treated as a random variable. If you do so, then you should be able to calculate the required posterior probabilities by the Bayes theorem. Note however that the answer you will get will depend on your chosen prior probability. That is, you need to specify a distribution for $n$ in advance.