I'm having a hard time intuitively understanding what this means in a machine learning context. When using the variables $A$ or $B$ or some trivial example, it all makes sense, but when looking at machine learning formulas where there are real variables its harder to see exactly what is meant. For example, if $t$ is what I am trying to predict and $x$ is the training example or input...
$$ p(t|x) = \frac{p(x|t)p(t)}{p(x)} $$
What is meant by $p(x)$? if $x$ is a training example, does it mean the probability of seeing $x$ out of all possible training examples (kind of like the probability of drawing $x$ from a hat)? the probability of seeing $x$ out of the previously known distribution of examples? or something else?
Sometimes I see this with model parameters such as $\theta$ as well which raises the same sort of questions.
Let's take your dice example to try to illustrate the issue. Here $T$ is your uncertain parameter and $t$ a value it can take, while $X$ is your observation and $x$ a particular value it can take.
Suppose you have a $t$-sided fair die, but you do not know what value $t$ has. You do have a prior distribution for $t$ of $P(T=t) = \frac{t}{2^{t+1}}$ for $t \in \{1,2,\ldots\}$.
You roll the die and observe a value of $X=x$. Since this is a fair die, you know $P(X=x \mid T=t) = \frac{1}{t}$ for $x \in \{1,2,\ldots\,t\}$
You can at this stage ask what is the unconditional $P(X=x)$? In other words, at the start what do you think the probability is of rolling a particular value $x$ even though you do not know how many sides the dice has? As a simple application of conditional probability $$P(X=x) = \sum P(X=x \mid T=t) P(T=t) = \sum\limits_{t=x}^\infty \frac{1}{2^{t+1}} = \frac{1}{2^{x}}$$
As examples, from the first bullet $P(T=6)=\frac{6}{128}$ and $P(T=7)=\frac{7}{256}$ etc. So the unconditional or marginal probability of rolling $X=6$ is $$P(X=6) = \frac{1}{6} \times \frac{6}{128} + \frac{1}{7} \times \frac{2}{256}+ \cdots = \frac{1}{64}= \frac{1}{2^6}$$
If you do roll a $6$ then you then know the number of sides $T \ge 6$, and you get a posterior probability mass function $$P(T=t \mid X=6) = \frac{\frac{1}{2^{t+1}}}{\frac{1}{2^{6}}} = \frac{1}{2^{t-5}}$$ for $t \ge 6$, so $P(T=6 \mid X=6)= \frac12$, $P(T=7 \mid X=6)= \frac14$, etc.