Probability of m samples outside interval notation

33 Views Asked by At

I have some doubt about how to write the probability of a specific event. Say $D$ is a distribution on $\mathbb{R}$ and $(b,c)$ an interval. How would you write mathematically the probability of drawing $m$ samples outside $(b,c)$?

Something like this?

$$ \mathbb{P}_D(x_1, \dots,x_m|\forall_i, x_i \in \mathbb{R} \setminus (b,c)) $$

1

There are 1 best solutions below

0
On BEST ANSWER

Once you start talking about multiple samples, there are a few things you need to be clear about. A distribution on $\mathbb R$ will have something to say about drawing a single sample, but for two or more samples you'll need a joint distribution. You can specify that $X_1, \ldots, X_m$ are independent, identically distributed (abbreviated "i.i.d.") variables such that $X_i \sim D.$ Then if you say $X$ is the vector $(X_1, \ldots, X_m),$ the distribution of $X$ will be the required probability distribution over all $m$ "samples." The probability space of this distribution is $R^m.$ Note that this distribution is not the same as $D$ unless $m = 1.$

The set $\{(x_1,\ldots,x_m) \mid (\forall i)(x_i \in \mathbb R\setminus(b,c))\}$ would be the set of all possible ways to draw $m$ samples in which none of the samples is in $(b,c).$ Another way to write the same set would be $(R\setminus(b,c))^m,$ where the superscript $m$ signifies that we take an $m$-way Cartesian product $(R\setminus(b,c))\times \cdots \times (R\setminus(b,c)).$ This set is an event within the probability space $R^m.$

You can represent the probability of this event by writing the event in complete set notation inside the parentheses of the joint probability function, for example, $$ P_X(\{(x_1,\ldots,x_m) \mid (\forall i)(x_i \in \mathbb R\setminus(b,c))\}).$$ You might prefer to write $P_{X_1,\ldots,X_m}$ rather than $P_X$ in order to remind the reader that this is really a probability over $m$ i.i.d. "samples."

An alternative to set notation is to write a logical assertion using the names of the random variables you have defined, for example, $$ P_X((\forall i)(X_i \in \mathbb R\setminus(b,c))).$$

I do not* know any convenient, unambiguous way to tell your reader the nature of the $m$ samples without using some text to name and describe random variables. Anyway, math that is all formulas with no text can quickly become hard to read.

Note that you should explicitly say i.i.d. when that is what you want, because it is possible to have a distribution of "samples" that is not composed of i.i.d. variables. For example, if you have a jar of jellybeans of different colors with a known number of each color, and you randomly draw $m$ jellybeans from it one at a time without replacement, the prior probability distribution for each jellybean is the same but the random colors drawn are not independent. An example of non-independent samples from a distribution on $\mathbb R$ may not seem as natural, but it is possible.

*Update: I just now coincidentally stumbled across this notation: $$ X_1,X_2,\ldots,X_m \stackrel{i.i.d.}{\sim} D $$ for $m$ i.i.d. variables each with distribution $D.$ See What does $X_1,X_2,...\stackrel{ind}{\sim}N(\mu, \sigma^2)$ mean?. I'm not sure I would use such a thing without explanation (and then only if I needed to define lists of i.i.d. variables many times in the same document), but such a think is sometimes used.