Suppose you have a large bag of solid colored marbles. You don't have any information about the number of colors, or their relative proportion in the bag; you do know that the bag is well mixed, i.e., the ordering you pull them out is random.
As you pull them out, you record the colors and get the sequence:
Red, Blue, Blue, Blue, Blue, Blue, Blue, Blue, Blue, Blue, Blue, Blue, ...
I'd like to know what predictions can be made about the next marble that will be chosen. Specifically:
- What is the probability of choosing another marble, blue or red (rather, what statistical statements can be made)?
- how can you assess the likelihood of choosing a color that has not appeared?
I'd also be interested in references that generally discuss statistical treatments of systems with missing information like the above.
As you may know, Bayes' theorem says $$P(T\vert D) =\frac{P(D\vert T)}{P(D)}P(T),$$
where you could interpret $T$ as "theory" and $D$ as "data". This shows you how to update your theory (that the proportions of colors are such and such) in light of new data (drawing marbles). $P(T)$ is called the prior and represents what we would guess that the correct theory was before having any data (essentially, we use our experience and guesstimate as best we can). $P(D)$ can just be viewed as a normalization constant. $P(D\vert T)$ is called the likelihood, and gives the probability of the data, given that the theory is correct. This all gives the posterior probability, $P(T\vert D)$, which is the probability of the theory given the data.
So, in your scenario, we might guess that there are three different colors (red, blue and green), and that they are in equal proportions (we don't know any better, so why not?). Then you draw the first one (a red). "All right," you think, "the likelihood for this is fairly high (one in three), so no need to adjust much yet." Then you draw the next ten marbles, and they are all blue! This should surprise you, 'cause it was not what your theory predicted at all. This makes the posterior probability plummet, as the likelihood for this is vanishingly small (approximately one in a million!), which tells you that your theory is not very good. Which theory gives you the best fit with data then? That would be one that tells you that red is in a one to ten proportion to blue, and that there are even fewer greens.
But note that the prior is important. It is what makes physicists at CERN not loose their minds in excitement when someone finds that neutrinos go faster than light - the prior for Einsteins theory of relativity is very close to $1$, since it has survived time and so many experiments, so the new data (that something with mass can go faster than the speed of light), doesn't change the posterior probability that the theory of relativity is correct... in order words, a large prior can make a theory sturdy.
All this "guesstimation", using ones experience, etc., may make you think that this is unmathematical/not objective, and in some sense it isn't. But there is no objective way of assigning a probability to something like how many marbles of color $X$ there is in the bag. The strength of Bayes' theorem is that it allows us to handle these kinds of fuzzy problems (which real-world problems always are), by using our experience.
I hope that helps and that you will want to find out more about Bayesian statistics by picking up a good book on probability theory. Cheers.