I am taking Andrew Ng's machine learning course on Coursera where, in the discussion of logistic regression, we say that a given hypothesis function returns:
$P(y=1|x;Θ)$
This is described as "the probability that y=1, given x and parameterized by Θ."
As background for those who don't know, the hypothesis function will look something like:
$h(x) = Θ^\top X = Θ_1 x_1 + Θ_2 x_2 + Θ_3 x_3$
(I mean the integers in the above equation to be subscripts but I don't know how to write them.) [edit: place mathjax delimiters around the equation, and use underscore to mark up subscript.]
With all that said, my questions are:
Why is that additional notation after the bar helpful? Why not just say $P(y=1)$? Is it because we might have different equations for assessing $P(y=1)$ and would need to distinguish between them, like $P(y=1|y;z)$?
Why do we say "given x" but "parameterized by theta"? Is there a difference? If I had to guess it would be that "parameterization" is how we choose coefficients for an equation before we are ready to use it (such as in our gradient descent / other fitting step) and we then say "given x" because it is a free variable and we're choosing a particular value. But I'm not sure if that's right.
When we write $~\mathsf P(y=1\mid x; \Theta)~$ we mean "the conditional probability of $y$ realising value $1$ when given $x$, using parameter $\Theta$."
The $x$ is a random variable, to which $y$ is dependent. As such, the conditional probability function of $y$ given $x$ will differ from the marginal probability function of $y$.
The $\Theta$ is a parameter of the distribution. It is not random. It has a fixed value; we just may not know what it is. So we treat the parameter as an argument of the probability function rather than a global constant.