Can A Probability Ever Be Outside of $0$ and $1$?

462 Views Asked by At

Recently, I have been studying the Multinomial Probability Distribution

Suppose you go to a casino and there is a game that involves rolling a six-sided die (i.e. one dice). However, you are not told what is the probability that this die lands on any one of these sides - this raises your suspicions and leads you to believe that perhaps the die might not be fair, therefore it might not be worth playing this game. You are still considering whether its worth playing this game - and suddenly find out that the casino has a large screen television that displays the last $100$ numbers that came from this die. Since you know that a die follows a Multinomial Distribution, you can use this fact to estimate the probabilities of the die assuming any given number, as well as the "spread" (i.e. variance) for each of these probabilities.

  • Using the Maximum Likelihood Estimation, I have been trying to derive the formulae for the parameters of the Multinomial Probability Distribution. In short, given an event $i$ (e.g. the number $2$ on a die), the (very obvious) estimate for the probability $p_i$ of this event is $$\hat{p_{i}}_{\text{MLE}} = \frac{n_{i}}{N}$$ where the number of times that the event $n_{i}$ appears and $N$ is the total number of events that were recorded. As always, probabilities are only defined between $0$ and $1$ - therefore these individual estimates for $p_{i}$ can never be greater than $1$ or less than $0$.

  • Next, using the equivalence between the (inverse) Fisher Information and Variance, I was able to work out the formula for the "variance of these probabilities". In short, the variance of $p_{i}$ is given by $$\text{var}(\hat{p_{i}}_{\text{MLE}}) = \frac{p_{i}^{2}}{n_{i}}$$

  • Finally, using the theory of Asymptotic Normality of MLE, we can derive Confidence Intervals for the estimates of these parameter estimates (i.e. each individual value of $p_{i}$). That is, you might have observed that the probability of rolling a $2$ on this die is $0.31$ - but there is also a $95\%$ chance that the probability of rolling a $2$ might be anywhere between $(0.28, 0.33)$. We can construct a $95\%$ Confidence Interval for any of these probabilities as: $$p_{i} \pm 1.96 \cdot \left( \sqrt{\frac{p_{i}^{2}}{n_{i}}} \right)$$

Question: I am worried that for certain values of $p_{i}$ and $n_{i}$, this expression $$p_{i} \pm 1.96 \cdot \left( \sqrt{\frac{p_{i}^{2}}{n_{i}}} \right)$$ might be greater than $1$ or less than $0$.

As an example, if $p_{i} = 0.9$ and $n_{i} = 16$, this results in a range estimate for the probability exceeding $1$, i.e. $$0.9 + 1.96 \cdot \sqrt{\frac{0.9^2}{16}}$$

Have I done this correctly? Is it really possible for a probability value to be outside a range of $(0,1)$?

Thanks!

Note: I obviously think I have done something wrong, because I don't know much in math - but out of the few things I know, probabilities will never be outside the range of $[0,1]$.

2

There are 2 best solutions below

2
On

A statistical model is the triad $(\Omega, \mathscr{A}, \mathbb{P})$ where $\Omega$ is a state space with $\omega \in \Omega$; $\mathscr{A}$ is a collection of interesting events, called $\sigma-$algebra, with $A \in \mathscr{A}$; and $\mathbb{P}$ is a probability measure such that, for each pair of events $A \subset \Omega, B \subset \Omega$ in $\mathscr{A}$

  • $\mathbb{P}(\Omega) = 1$
  • $\mathbb{P}(\emptyset) = 0$
  • $\mathbb{P}(A^C) = 1- \mathbb{P}(A)$
  • $\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A\cap B)$

Importantly, the probability measure asigns, to any event $A$, a number between zero and one, that is, $\mathbb{P}(A)\in [0,1], \quad \forall \ A \in \mathscr{A}$.

0
On

Direct Answer

There are two notions to consider here, measures and probability measures. As probability measure is a specific type of measure. I will define both below and then give you some commentary in relation to your question.

A measure is a function $\mu: \mathscr{F} \rightarrow [0, \infty]$ on a measurable space $(\Omega, \mathscr{F})$ which satisfies:

  1. $\mu (\emptyset) = 0$

  2. $\mu (\cup _{n=1}^{\infty} A_n) = \sum _{n=1}^{\infty} \mu (A_n)$ for every disjoint sequence of sets $(A_n)_{n \in \mathbb{N}} \subseteq \mathscr{F}$

Measures take sets in a $\sigma$-algebra and assign a value to them to give each set some notion of "size" or "measure". These can take any values between $0$ and $\infty$ and are not restricted not giving values between $0$ and $1$.

One example of a measure would be the counting measure. This measure assigns "measure" to sets based on their cardinailty. For example the measure of the set $\{A,B,C\}$ would be $3$ since there are $3$ elements.

A probability measure is a function $\mathbb{P}$ on the measurable space $(\Omega , \mathscr{F})$ which satisfies the above properties of a measure and also satisfies the additional requirement that $\mathbb{P}(\Omega) = 1$.

Therefore, any choice of probability measure will always lie between $0$ and $1$ since the measure of the entire sample space is $1$ (by definition).

Therefore, whilst it is true that a measure can take values outside of the interval $[0,1]$, this is not the case when we are dealing specficially with probability measures.

Clarifying the confusion

Now that this is clarified, we can look specifically at the confusion that you have made in your post.

Here you have made an approximation and therefore, you can't expect it to necessarily predict values in the $[0,1]$ range. The fact that this approximation is not in this range is, however, indicative of the fact that it isn't the best approximation.

Unlike the way that probability measures are defined (above), this approximation is not subject to the same constraints and therefore there is no reason why we should think that the values it suggests will be in the desired interval.