I have a question similar in spirit to this one.
In essence, what does a generating function (moment, probability, characteristic, other?) "do" to a random variable $X$, and how are the generating functions related? Does it:
- allow us to break up the function $X: \Omega \to \mathbb{R}$ into component functions with interpretable "coefficients" (like a Fourier transform), or
- can it be seen as a particular way of approximating the distribution function $F_X(x)$, or
- is there an interpretation for how the moments of $X$ relate to the generating functions (to motivate "differentiate and evaluate at $t=0$"), or
- something else? It feels like I'm trying to take a shortcut to interpreting these things without having taken functional analysis / more math, so I'm curious if these intuitions are totally off.
For (1), form the Wikipedia entry, "generating functions" are used to turn encode an infinite sequence into coefficients in a formal power series. I'm wondering if there's a way of explaining the Fourier transform of the density of $X$ in terms of something that relates to viewing $X$ as a function and approximating it with basis functions, etc. If there is such an interpretation, is there a motivation for the "coefficients" and working in an "algebraic"/frequency domain?
Similarly, for (2), from the Wikipedia entry for "characteristic functions", it draws a distinction between the distribution function $$ F_X(x) = E \left[ \mathbf{1}\{X\leq x\} \right] $$ and the characteristic function $$ \psi_X(t) = E \left[ e^{itX} \right]. $$ It hadn't occurred to me to think of it this way, but is there any way of thinking about $\psi_X$ as a smooth approximation to the indicator function for approximating $F_X$ with some relation between the arguments $x$ and $t$ (disregarding $i$, so I guess considering the mgf)? I've seen the idea of a generating function be used in convex optimization to bound probabilities, which is different, but I'm wondering if there's some connection.)
Finally for (3), I don't have any far-fetched hypotheses like in (1-2), but I'm just curious if there's some motivation for this beyond "it falls out of the properties of generating functions for sequences".
For a continuous random variable $X$ with a probability density function $f(x)$, the moment generating function is exactly the Laplace transform of $f(x)$ (up to a sign), and similarly the characteristic function is exactly the Fourier transform of $f(x)$ (again up to a sign, and possibly some other normalizing factors depending on conventions). These are both somewhat sophisticated mathematical operations and ultimately I think there's no substitute for understanding them beyond working through some theorems and examples and seeing how they behave.
A basic property of both of these transforms is that they intertwine convolution and multiplication, and for random variables convolution corresponds to adding independent copies, so we get a basic and important property of both the MGF and the characteristic function, which is that (I'll only state this property for the characteristic function) if $X$ and $Y$ are independent then
$$\phi_{X+Y}(t) = \mathbb{E}(\exp(it(X+Y)) = \mathbb{E}(\exp(itX) \exp(itY)) = \mathbb{E}(\exp(itX)) \mathbb{E}(\exp(itY)) = \phi_X(t) \phi_Y(t).$$
This makes the characteristic function very well-suited to understanding sums of independent random variables, which is responsible e.g. for its role in one of the standard proofs of the central limit theorem.
One way to build intuition for these more sophisticated transforms is to start with the simpler generating functions that arise in combinatorics (which can sometimes be turned into probability generating functions). You can check out, for example, Wilf's generatingfunctionology. As a simple example, the function $(1 + x)^n$ is both the generating function of the binomial coeficients (by the binomial theorem) and also produces, upon division by $2^n$, the probability generating function $\left( \frac{1}{2} + \frac{x}{2} \right)^n$ of a sum of $n$ independent Bernoulli random variables (the binomial distribution). This readily gives the MGF $\left( \frac{1}{2} + \frac{e^t}{2} \right)^n$ as well as the characteristic function $\left( \frac{1}{2} + \frac{e^{it}}{2} \right)^n$, although interpreting these is more mysterious.
I also found these notes by Terence Tao on concentration of measure helpful.