I am currently self-studying information theory from "Quantum Information Theory" by Mark M. Wilde. He uses a kind of notation that I don't understand at all. I will explain the problem using quotations from the book:
Let $p_X(x)$ be the probability mass function associated with random variable $X$, so that the probability of realization $x$ is $p_X(x)$...
So far, so good. The random variable $X$ can produce different numbers, and if it produces (say) $0.5$ with probability $0.25$, then $p_X(0.5)=0.25$.
I start to fall to pieces a little latter, when the author begins to write things like $p_X(X)$. I'm not sure how to read this notation. It seems to be saying that the random variable has itself as an output? The author goes on to write:
There is nothing wrong mathematically here with having a random variable $X$ as the argument to the density function $p_X$, though this expression may seem self-referential at first.
And later, when introducing the same notation again:
It may seem strange at first glance that $X$, the argument of the probability mass function $p_X$ is itself a random variable, but this type of expression is perfectly well-defined mathematically.
But how is it defined? That's my question. I'm not looking for a rigorous answer, but a wordy explanation of how I can read/interpret such an expression, and maybe a simple example, would be truly appreciated.
The important thing to note is that $p_X$ depends on $X$ only through its distribution. So $p_X$ is just a function, and as with any* function, like $f(x) = x^2 +3$, say, we can make sense of a function of a random variable, so that $f(X) = X^2 + 3$ is also a random variable, and so is $p_X(X)$.
Your confusion might arise because the quantity $X$ appears in the notation twice. Something that may make what's going on a bit clearer is to say let $\widetilde{X}$ be another random variable with the same distribution as $X$. Then this new random variable's density, $p_{\widetilde{X}}(x)$, must be the same function as $p_X(x)$, since $X$ and $\widetilde{X}$ have the same distribution. Since $p_{\widetilde{X}}$ is just a function, we can make sense of $p_{\widetilde{X}}(X)$. But this is just the same as $p_X(X)$.
*modulo measure-theoretic requirements - if you haven't come across measure theory, don't worry about this.