p(A|B) where A is not an event, but a real number in [0,1]

72 Views Asked by At

I ran into some notation in an article I'm reading, and I can't figure out what the author is trying to say.

Specifically, say I'm flipping a coin with unknown probability $\omega_H$ of getting heads. Say $T$ is the event of me flipping the coin and getting tails. What on earth does it mean to say $p(\omega_H|T)$? This is the exact notation used in the article, and in the surrounding text the author is talking about Bayesian updating.

As far as I know, to use Bayes's Theorem with expressions like $p(A|B)$, $A$ and $B$ have to be events that either happen or don't happen, not arbitrary values.

Does anyone know what this author is trying to say?

3

There are 3 best solutions below

0
On BEST ANSWER

(The article at your link doesn't display properly in my browser. Here is a more-readable version.)

The main source of confusion seems to be your thinking that $p(\omega_H\mid T)$ denotes a probability, whereas it is, rather, a (posterior) probability density function of a continuous random variable $\widetilde{\omega}_H$. Another source of confusion may be that the author uses the same symbol for a random variable as for its values. To distinguish them, here I'll put a tilde on the random variable (i.e., $\widetilde{\omega}_H$) and use ${\omega}_H$ (without the tilde) to denote a value of the random variable.

Here's the relevant part of the article:

For a binary question, one may interpret the model as follows. Each respondent privately and independently conducts one toss of a biased coin, with unknown probability $\omega_H$ of heads. The result of the toss represents his opinion. Using this datum, he forms a posterior distribution, $p(\omega_H\mid t^r)$, whose expectation is the predicted frequency of heads. For example, if the prior is uniform, then the posterior distribution following the toss will be triangular on $[0,1]$, skewed toward heads or tails depending on the result of the toss, with an expected value of onethird or two-thirds.

The author uses $t^r$ to denote the $r$th respondent's truthful answer to a single $m$-choice question, represented by a vector whose components are all $0$ except for a $1$ in the position of the choice. So, for a binary question, $t^r$ is either $(0,1)$ or $(1,0)$ (corresponding to, say, Tail or Head, respectively).

Thus, the unknown parameter $\widetilde\omega_H$ has an assumed prior distribution described by a probability density function $p(\omega_H)$, and the additional information provided by the value of $t^r$ will "update" this to a posterior density function $p(\omega_H\mid t^r)$ in the usual Bayesian manner.

0
On

Essentially $\omega_H$ is a parameter of the event X which is flipping the coin. X we know from a coin can take 2 values, 1 and 0. Each have a probability of $\frac 12$ ONLY if fair. This is represented by $P(Heads)=\omega_H=\frac 12$ and $\omega_T=1-\omega_H=\frac 12$.

In our case we don not know if the coin is fair. Hence we must used the information or data, $T$, to find the probability distribution of $\omega_H$. Essentially $p(\omega_H|T)$ is our posterior distribution after observing our data $p(T|\omega_H)$ and our assumptions about $p(\omega_H)$ before. For example we might assume the coin is unfair and will land on Heads, on average, 75% of the with some variance. Then we can say $p(\omega_H)$ has a Beta distribution with parameters $a=3, b=1$ so our $E(\omega_H)=\frac 34$ but is randomly distributed.

This is how Bayesian statistics works. We define our distribution for the data, conditional on our parameter. We also assume a prior distribution on the parameter. We then use Bayes rule to find the posterior distribution, that takes into account our prior assumptions and learn from our experience, i.e. the data, of what the parameter distribution is, after accounting for the data. In mathematical terms: $$P(\omega_H|T)= \frac {P(T|\omega_H)P(\omega_H)}{P(T)}$$ where $$P(T)=P(T|\omega_H)P(\omega_H)+P(T|\omega_T)P(\omega_T)$$

Please let me know if I can clarify.

2
On

The key is that this is about statistics rather than probability.

The probability of getting Heads, $\omega_H$, is unknown, and you are trying to find out how likely it is that $\omega_H$ is equal to such or such value, given a sample of trials. In other words you build a distribution for $\omega_H$, conditional on your sample.