Meaning of symbols after the bar in probability notation

696 Views Asked by At

I am taking Andrew Ng's machine learning course on Coursera where, in the discussion of logistic regression, we say that a given hypothesis function returns:

$P(y=1|x;Θ)$

This is described as "the probability that y=1, given x and parameterized by Θ."

As background for those who don't know, the hypothesis function will look something like:

$h(x) = Θ^\top X = Θ_1 x_1 + Θ_2 x_2 + Θ_3 x_3$

(I mean the integers in the above equation to be subscripts but I don't know how to write them.) [edit: place mathjax delimiters around the equation, and use underscore to mark up subscript.]

With all that said, my questions are:

  1. Why is that additional notation after the bar helpful? Why not just say $P(y=1)$? Is it because we might have different equations for assessing $P(y=1)$ and would need to distinguish between them, like $P(y=1|y;z)$?

  2. Why do we say "given x" but "parameterized by theta"? Is there a difference? If I had to guess it would be that "parameterization" is how we choose coefficients for an equation before we are ready to use it (such as in our gradient descent / other fitting step) and we then say "given x" because it is a free variable and we're choosing a particular value. But I'm not sure if that's right.

1

There are 1 best solutions below

3
On

When we write $~\mathsf P(y=1\mid x; \Theta)~$ we mean "the conditional probability of $y$ realising value $1$ when given $x$, using parameter $\Theta$."

The $x$ is a random variable, to which $y$ is dependent.   As such, the conditional probability function of $y$ given $x$ will differ from the marginal probability function of $y$.

The $\Theta$ is a parameter of the distribution.   It is not random.   It has a fixed value; we just may not know what it is.   So we treat the parameter as an argument of the probability function rather than a global constant.