Expressing a conditional probability formula in words

4k Views Asked by At

Given a random variable $X$ and probability mass function $pmf_1$, $P(x \vert pmf_1)$ denotes the probability that the RV $X$ takes value $x$, given that the RV $X$ belongs to $pmf_1$.

How to write $P(pmf_1 \vert x)$ using similar wording as above? It is the probability that a random variable $?$ takes the value $?$ given that we observe the point $x$.

I'm not sure what the questions marks be substituted by. I know another wording of $P(pmf_1 \vert x)$ i.e. probability that $x$ came from $pmf_1$, but what explicitly is the random variable here?

Context: The context I was thinking of is that of following. Suppose we have two pmfs and we want to find out which pmf the observation $x$ came from. In this context, is it possible to explicitly state what the random variable for $P(pmf_1 \vert x)$ is?

2

There are 2 best solutions below

4
On

Usually, when you say "random variable," you are already assuming the existence of its fixed probability mass/density function. If this is the case, then $P(X=x \mid \text{pmf})$ is redundant.

$P(\text{pmf} \mid X=x)$ is kind of weird, and starts getting into Bayesian stuff. As you noted, it is unclear what the random variable is. Perhaps I can describe an example where it does make sense.

Suppose we have a coin that might have a bias. So we know $X \sim \operatorname{Bernoulli}(p)$ for some parameter $p$, but we don't know $p$. Does $p$ even have a distribution? In the Bayesian context, one "base" assumption is that $p$ is uniformly distributed on $[0,1]$, we will assume that here. Suppose we flip it $10$ times and get $7$ heads. Then the distribution for the parameter $p$, given this knowledge, can be written as $$P(p \mid \mathcal{D}=\{1,1,1,1,1,1,1,0,0,0\}).$$ This is known as the posterior distribution. The maximum likelihood estimate for $p$ can be found by noting that $$P(p \mid \mathcal{D}) \propto P(\mathcal{D} \mid p) P(p)$$ by Bayes's rule, and then maximizing the right-hand side. It turns out that the maximum likelihood estimate is exactly the sample proportion $7/10$. [There are some issues with using the maximum likelihood; for example, if you flip two heads in a row, then the maximum likelihood estimate would be $p=1$, i.e., the coin always gives heads, which seems absurd. This can be fixed by changing what we assume the distribution of the prior $P(p)$ is, instead of being uniform.]

3
On

If $\mathsf P(x\mid\mathrm{pmf_1})$ is read as "the probability that a random variable, $X$, takes on a value given that it belongs to a distribution with probability mass function $\mathrm{pmf_1}$", then: $\mathsf P(\mathrm{pmf_1}\mid x)$ would be read as "the probability that a random variable, $X$, belongs to a distribution with probability mass function $\mathrm{pmf_1}$ given that it takes on a value."

Here the identity of random variable is implicit from the context, not from the symbols themselves. It's basically shorthand.

To be explicit you would write something like: $\mathsf P(X\!=\!x\mid F_X\!=\!\mathrm{pmf_1}), \mathsf P(F_X\!=\!\mathrm{pmf_1}\mid X\!=\!x)$, where $F_X$ is the probability mass function of $X$.

Or just passably: $\mathsf P_{X\mid F_X}(x\mid \mathrm{pmf_1}), \mathsf P_{F_X\mid X}(\mathrm{pmf_1}\mid x)$

It's also rather unusual notation.  What is the context?


Context: The context I was thinking of is that of following. Suppose we have two pmfs and we want to find out which pmf the observation $x$ came from. In this context, is it possible to explicitly state what the random variable for $\mathsf P(\mathrm{pmf_1}\mid x)$

Well, it is possible to do this using Bayes' Theorem.

$$\mathsf P_{F_X\mid X}(\mathrm{pmf_1}\mid x) = \dfrac{\mathsf P_{X\mid F_X}(x\mid \mathrm{pmf_1})\mathsf P_{F_X}(\mathrm{pmf_1})}{\mathsf P_{X\mid F_X}(x\mid \mathrm{pmf_1})\mathsf P_{F_X}(\mathrm{pmf_1})+\mathsf P_{X\mid F_X}(x\mid \mathrm{pmf_2})\mathsf P_{F_X}(\mathrm{pmf_2})}$$

You know $\mathsf P_{X\mid F_X}(x\mid \mathrm{pmf_1})$. It's just $\mathrm{pmf_1}(x)$.

Likewise $\mathsf P_{X\mid F_X}(x\mid \mathrm{pmf_2})=\mathrm{pmf_2}(x)$

That leaves determining the prior probabilities for $\mathsf P_{F_X}(\mathrm{pmf_1}), \mathsf P_{F_X}(\mathrm{pmf_2})$.   These would be the measure of certainty that $F_X$ is either function.