I'm learning about probability from the book Pattern Recognition and Machine Learning by Christopher Bishop. It includes a justification for the definition of entropy that can be summarized as:
let $x$ and $y$ be independent events, that is $$p(x,y) = p(x) \cdot p(y)$$ and
$$h(x,y) = h(x) + h(y)$$ because entropy is designed to mean amount of surprise and independent events should have separate contributions of surprise.
The definition $$h(x,y) = \log_2 p(x,y)$$ is one of a family of definitions that satisfy the properties we want (other logarithm bases are obvious, but maybe there are other functions with these properties).
However, Bishop doesn't write that equation for $h(x,y)$ and jumps right into saying $$h(x) = \log_2 p(x)$$ It seems like so far, the line of thought has been tied to joint distributions. Does probability define some sort of identity event such that $p(x,a) = p(x)$? Maybe $p(x,x) = p(x)$ is such a thing? Or maybe that is not a sensible thing to think of, and I'm missing some sort of notational point, and entropy has some special connection to situations involving multiple events?
Define a random vector $Z$ which is a result of concatenating $X$ and $Y$. $H(X,Y) := H(Z)$.