Drawing marbles from a box

300 Views Asked by At

Say we're drawing marbles from a box. The marbles can be labeled X, Y, or Z and can be either black, brown, or white. The probability of drawing a marble with each letter label is unknown but fixed and marbles are replaced after each draw.

A marble labeled X is half as likely to be brown as the other colors. A marble labeled Y is half as likely to be black as the other colors. A marble labeled Z is half as likely to be white as the other colors.

Say on our twentieth draw from the box you see a black Y for the first time.

Question I came up with: What is the best estimate for the probability of drawing a marble labeled Y from the box?

3

There are 3 best solutions below

0
On BEST ANSWER

Let $p=0.2P(Y)$ be the probability of picking a black $Y$.

Given this, one approach is to model the number of draws $D$ to the first $Y$ as a geometric distribution with unknown success parameter $p$.

The probability that we first see $Y$ on draw $k$ is given by:

$$P(D=k) = (1-p)^{k-1}p$$

In your case, $k=20$ so we get $p(1-p)^{19}$

For inference, we can make this a function of $p$ to get the likelihood function $\mathcal{L}(p;k)$. It's often easier to work with the log likelihood $\ell(p;k)$

$$\ell(p;k)=\ln \mathcal{L}(p;k) = \ln(p) + 19\ln(1-p)$$

Now we look for the $p$ that maximizes the log likelihood to get the maximum likelihood estiamte:

$$\frac{d}{dp}\ell(p;k) = \frac{1}{p}-\frac{19}{1-p} = 0 \implies p = \frac{1-p}{19} \implies 19p+p = 1 \implies p=\frac{1}{20}$$

Which is the obvious result (as most maximum likelihood estimators are :)

Below is a plot of the log-likelihood region around $p=0.05$ with $p=0.05$ and $\ell = -3.9703$ shown as vertical and horizontal lines, respectively.

You can see the maximum log-likelihood is $\ell(0.05;19)= \ln(0.05) + 19\ln(0.95) \approx -3.9703$

enter image description here

Given that $\hat p = 0.05$ we have that $P(Y) = \frac{p}{0.2} = 0.25$

The approximate 95% CI for $p$ can be derived used Wilk's Theorem of the Likelihood Ratio Statistic, which states that

$$D:=2\left[\ell(5(0.05));19)-\ell(5p;19)\right] \sim \chi_1^2 $$

The 95th percentile of $\chi_1^2$ is $3.841$ therefore:

$$P(D\geq 3.841) = 0.05 \implies \ell(0.05;19) - \frac{3.841}{2} = -3.9703 - \frac{3.841}{2} \approx -5.891$$

So our 95% confidence interval will include values of $p$ such that

$$\ell(p;19) \geq -5.891$$

Which is approx $(0.005,0.20)$

enter image description here

Converting this to a CI for $P(Y)$ gives us $(0.025, 1)$ so a pretty large amount of uncertainty still.

We can compare this to an inverted hypothesis test based interval:

Let's assume $P(Y)=1$ (so only $Y$'s in the bucket). Then there is still a 1% chance we'd have to draw 20 times to see our first black $Y$. If we look at the upper and lower 97.5 cutoffs for significance we see that the range of $P(Y)$ is $(.012, 0.7)$

This level of uncertainty makes sense because we are only told a very limited piece of information -- we don't know the outcome of the previous 19 draws, which would very much constrain the probability even more.

0
On

Let's start with the colors:

A marble labeled Y is half as likely to be black as the other colors.

(The other labels aren't relevant to your question)

This means that if the probability of a Y marble being colored black is $P_{black}$ then we are given that $P_{black} = \frac{P_{white}}{2} = \frac{P_{brown}}{2}$ Since these values sum to 1: $P_{black} = .2$

Using this information most of the stuff is unnecessary. All we need to know is that whatever .2Y is it took 20 tries to achieve it. We can use a beta distribution to model this situation since it is the conjugate prior probability distribution to the geometric distribution.

In our case: $ \alpha =2$ and $\beta =20$ since there was 1 event and 19 failures. Then the graph of the distribution looks like this:enter image description here

The y-value at x represents the probability that x is the probability of the event in our scenario. So assuming the probability distribution of the probability is uniformly randomly distributed aside from the information given, then the point where y is maximized in this distribution between 0 and 1 gives the expected value of $\mathbb P(.2Y)$ which is $\frac{1}{20}$ in this case. (Intuitively it makes sense for the probability of it taking 20 tries for something to happen to be 1 in 20). Finally this gives $\mathbb P(Y) = .25$ or $\frac{1}{4}$.

The advantage of using the beta distribution here is we get to see the probability of $\mathbb P(Y) $ being a different value. For example if we wanted to know the probability of it being .2 it would be about .0576

0
On

A marble labeled Y is half as likely to be black as the other colors. The marbles are replaced after each draw. You observed a black Y on the twentieth draw. Let's denote the following probabilities:

P(Y) = Probability of drawing a marble labeled Y.

P(B) = Probability of drawing a black marble.

P(B|Y) = Probability of drawing a black marble given it's labeled Y.

P(B|X) = Probability of drawing a black marble given it's labeled X.

P(B|Z) = Probability of drawing a black marble given it's labeled Z.

From the information given, we have:

P(B|Y) = 1/2 * P(B) (Y marble is half as likely to be black)

P(B|X) = P(B) (No specific information about X)

P(B|Z) = 1/2 * P(B) (Z marble is half as likely to be black)

Now, let's consider the probability of observing a black Y on the twentieth draw. This can happen in two ways:

Drawing a Y marble that is black.

Drawing a non-Y marble (X or Z) that is black.

Mathematically: P(B and Y) = P(Y) * P(B|Y) + (P(X) + P(Z)) * P(B) = P(Y) * (1/2 * P(B)) + (P(X) + P(Z)) * P(B)

Since marbles are replaced after each draw, the probabilities of drawing each type of marble (X, Y, Z) are not affected by the previous draws.

Now, let's consider the probability of observing a black Y on the twentieth draw, given that you've never seen a black Y in the previous nineteen draws:

P(Y on 20th | no black Y in 1-19) = P(B and Y) / (1 - P(B and Y in 1-19))

This is the probability of drawing a black Y on the twentieth draw and it's not been observed in the previous draws.

Given that you observed a black Y on the twentieth draw, the best estimate for the probability of drawing a marble labeled Y from the box can be computed using Bayes' theorem. This involves calculating the conditional probability P(Y | B and Y), which takes into account the fact that a black Y was observed on the twentieth draw.

It's important to note that while the probabilities P(X), P(Y), and P(Z) are not provided explicitly, you might need more information or assumptions about the distribution of the marbles to derive a specific numerical estimate for P(Y). The calculation involves several unknown probabilities that would need to be determined or estimated based on the given information or additional context.