Probability notation

401 Views Asked by At

There is a random variable $B$ (a box) and a particular value it takes $r$ (red color). In Bishop's Pattern recognition and machine learning textbook, he said in p. 14:

Instead, we may simply write $p(B)$ to denote a distribution over the random variable $B$, or $p(r)$ to denote the distribution evaluated for the particular value $r$...

And then he goes on to say for r.v.s $X$ and $Y$.

...the quantity $p(Y |X)$ is a conditional probability and is verbalized as "the probability of $Y$ given $X$"

Here are my questions:

  1. What exactly does "a distribution over the random variable $B$" mean? Does this distribution mean some mapping of values that r.v. can take to probability values? I usually think about distribution in terms of PDF or CDF. I think $p(r)$ is the density (or PMF in discrete r.v. case).
  2. How can we condition on a random variable $X$? My understanding is that we can only condition on events. Does $p(Y |X)$ mean conditional distribution of $Y$ given $X$ taking some value which is not specified?
2

There are 2 best solutions below

0
On BEST ANSWER
  1. What exactly does "a distribution over the random variable $B$" mean? Does this distribution mean some mapping of values that r.v. can take to probability values? I usually think about distribution in terms of PDF or CDF. I think $p(r)$ is the density (or PMF in discrete r.v. case).

$p(B)$ represent the probability mass functions of random variable $B$ at arbitrary values.   It is a terribly lazy (but unfortunately common) shorthand used when authors are more interested in showing the dependencies between random variables rather than any particular evaluations.

$p(r)$ demonstrates why this is a horribly confusing idea, as the "probability mass function of value $r$", is meaningless unless it is implicitly clear what is the random variable being discussed.

More properly we should write something like $p_{\small B}(r)$ to indicate the probability mass function of random variable $B$ evaluated at $r$.   In this case that is $\mathsf P(B=r)$; the probability for the event of the box being red.

  1. How can we condition on a random variable $X$? My understanding is that we can only condition on events.   Does $p(Y |X)$ mean conditional distribution of $Y$ given $X$ taking some value which is not specified?

Again, the this appears to be an abbreviation for $p_{\lower{0.5ex}{\small Y\mid X}}(y\mid x)$, or $\mathsf P(Y=y\mid X=x)$, where $x,y$ are arbitrary arguments.

The occurrence of $X$ realising a particular value is an event.

5
On

I have the following distinctions:

  • $X$ is a random variable, it is a set of possible values associated with a density (which somebody called, a $\sigma$ algebra). Being in caps or not is more or less insensitive as expressions goes more involved, such when we include matrices, for example, in Kalman and DSP topics. We always can say $x$, $y$ and $\omega$ are random variables (are you capitalizing $\omega$?). I would suggest you do not rely on caps for this, though this do not harms so much.
  • $x$ is a realization of a random variable, this is, a particular value taken. In stochastic models, when a variable can take several realizations at different instants, this convention will automatically render obsolete itself. One always can say $x_0$ or $x_k$ or even $x_t$ are realization of the random variable $x$.
  • $P(X)$ is the probability of $X$ taking a value $x$ in the set $\mathscr A$. Because $X$ is a random variable, the probability operator $P$ can be applied for every random variables and every subsets of its domain, even for single values $P(X=x_0)$.
  • $P$ is an operator always applicable over sets, which you call an event is a realization over a set of possible states. Because we talk about sets we can solve much of the simplest problems and Bayes propositions as Venn diagrams. By thinking on events the situation goes a little intuitive and sometimes esoteric,
  • $y=p(x)$ is the probability density distribution for the random variable. $p$ is a function and also there is no confusion when knowing its value at, e.g. $y_0=p(0)$.
  • This function allows the calculation: $p_0=P(X \in \mathscr A )=\int_{\mathscr A}p(x) dx$,
  • $p(x_0)$ is the distribution for the realization value $x_0$. This is bad. A realization after performed, is no longer stochastic. And you know that the distribution of deterministic values is $y=\delta(x-x_0)$. $p$ no longer exists.
  • $P(X|Y)$ is the conditional probability. This then, if we want to be consistent, is bad. We should put $P(X \in \mathscr A | Y \in \mathscr B)$, turning the calculation for explicit sets. Here we have two variables, uncorrelated (independent) at the best, unknownly correlated (dependent) at worst. You have to think in a combined density distribution $z=p(x,y)$ as your choice for expressing the most general case. In the best independency cases $z=p(x)q(y)$ in which you only deal with the variables individually.
  • The calculation in the previous case requires the identity $P(X|Y)={P(X,Y) \over P(Y)}$. $P(Y)$ is calculated as ussually, while $P(X \in \mathscr A ,Y \in \mathscr B)=\int_{\mathscr A} \int_{\mathscr B}p(x,y)dydx$.