My lecture notes state Bayes' theorem, for data $\mathcal D$ of distribution parametrised by unknown parameters $\mathbf w$, as follows:
$$p(\mathbf w | \mathcal D)=\frac{p(\mathcal D | \mathbf w)p(\mathbf w)}{p(\mathcal D)}$$
After this, they refer to $p(\mathcal D | \mathbf w)$ to as a "likelihood" and $p(\mathbf w)$ as the "prior distribution".
My confusion stems from the use of the notation $p(\bullet)$. Does this refer to a probability, a probability density function or to a distribution in the abstract sense (like we would write $N(\mu,\sigma^2)$? It seems to be used in various places to mean either. I attach below a screenshot from a wikipedia article highlighting a dual use as first a distribution and then a probability:
In the Bayes formula above, we have another example of potential mixing of the notations. In the continuous case, it appears to me that this formula implies the notation to be referring to a probability density function:
In the continuous case the likelihood is a probability density function. If $\mathcal D$ is continuously distributed, we define the likelihood $L(\mathbf w | \mathcal D):=f(\mathcal D | \mathbf w)$, where $f$ is the probability density function of $\mathcal D$ given parameters $\mathbf w$. We thus have $p(\mathcal D | \mathbf w)$ referring to a probability density function $f(\mathcal D | \mathbf w)$, and clearly then also $p(\mathbf w)$ and $p(\mathcal D)$ refer to PDFs too.
We thus have consistency in this case - though both uses of the wikipedia article contradict this; if $p(x|\theta)$ in the article refers to a PDF, it wouldn't make sense to say $x_i \sim p(x|\theta)$ right? That would be like saying $x \sim \frac{e^{-\frac{x^2}{2}}}{\sqrt{2\pi}}$ instead of $x\sim N(0,1)$ - the former is the PDF of the standard normal, whilst the latter refers to the standard normal distribution itself. These are not the same thing, are they?
Bayes' theorem in the mixed discrete/continuous case is even more confusing - I find. If $\mathcal D$ is discrete, then the likelihood $p(\mathcal D | \mathbf w)$ is a probability (in the discrete case, $L(\theta | X)=\mathbb P(X|\theta)$). Should we have $\mathbf w$ continuous then $p(\mathbf w)$ is I believe still a PDF, and we are mixing probabilities and PDFs in the same formula - with the same notation being used for both. This could arise, for example, if we are counting the number of heads and tails (discrete) parametrised by some unknown probability continuously distributed in $[0,1]$.
Any clarification on all of this would be greatly appreciated.
