I've been reading a text containing an introduction to probability theory, and I ran into the following formula for conditional probability distributions. Note that $Pr(x)$ here is not the probability of an event, but the PDF of a random variable $x$.
$$\operatorname{Pr}(x\mid y=y^*)=\frac{\operatorname{Pr}(x,y=y^*)}{\int\operatorname{Pr}(x,y=y^*)dx}=\frac{\operatorname{Pr}(x,y=y^*)}{\operatorname{Pr}(y=y^*)}\tag{2.3},$$
The equation itself makes simple sense to me. We're just normalizing the joint PDF of $x$ and $y$ where $y = y^*$. The notation does seem a tad odd to me, though - I'm interpreting the numerator of the second expression as $\operatorname{Pr}(x, y) |_{y=y^*}$. Would that be accurate?
If so, then the second part is simple as well. We're just marginalizing the denominator to get a simpler expression. The denominator can then be expressed as $\operatorname{Pr}(y)|_{y=y^*}$, just the probability that $y=y^*$. My confusion comes in when the notation is simplified, though:
$$\operatorname{Pr}(x\mid y)=\frac{\operatorname{Pr}(x,y)}{\operatorname{Pr}(y)}.\tag{2.4}$$
The simplified notation looks nice, but we've already defined $\operatorname{Pr}(x, y)$ and $\operatorname{Pr}(y)$! In this sense, we're dividing a joint PDF by a single-variable PDF, which hasn't been defined, as far as I can tell. I believe I'm supposed to implicitly assume $y=y^*$ here, but that seems awfully arbitrary to me. It especially seems unintuitive to assume different variables when more variables become involved, such as in Bayes' Theorem:
$$\begin{align} \operatorname{Pr}(y\mid x) & = \frac{\operatorname{Pr}(x\mid y)\operatorname{Pr}(y)}{\operatorname{Pr}(x)}\\ & = \frac{\operatorname{Pr}(x\mid y)\operatorname{Pr}(y)}{\int\operatorname{Pr}(x,y)\,dy}\\ & = \frac{\operatorname{Pr}(x\mid y)\operatorname{Pr}(y)}{\int\operatorname{Pr}(x\mid y)\operatorname{Pr}(y)\,dy},\tag{2.9} \end{align}$$
In such a circumstance, am I supposed to see such a formula, associate $\operatorname{Pr}(x)$ with the $\operatorname{Pr}(y\mid x)$ on the other side, then assume $\operatorname{Pr}(x)$ is not the general distribution itself but $\operatorname{Pr}(x)$ at some $x=x^*$? Such a syntax appears rather imprecise and seems to involve a lot of guesswork - am I interpreting it correctly?
Isn't it funny what a good night's sleep can do to some faulty intuition?
Responding to the issue of dividing a joint PDF by a single-variable PDF, both are simply scalars, so we can just divide them pointwise like any other function. Equation 2.3 is true at any value $y=y^*$, so by unbinding the value of $y$ and allowing it to vary, we prove it for the entire distribution.
As such, it's not a notational issue. Both sides of equation 2.4 are exactly equal.