What is the meaning of the cumulant generating function itself?

2.1k Views Asked by At

If we define the characteristic function for a random variable X as

$\Phi(t)=<e^{itX}>$

then it seems like we can think of it as essentially a spectral decomposition that measures the contributions of different frequencies to the probability distribution for X. I know how the moments are related to the derivatives at $t=0$, but I think that I might be missing some of the deeper connection between the moments and the spectral decomposition. If anybody had some thoughts on this then I would love to hear them, but I'm particularly interested in the same sort of question applied to the cumulants.

We can then define the cumulant generating function in terms of $\Phi$ such that

$\Psi(t)=\ln\Phi(t)$

and

$\Psi^{\prime}(t)=\frac{\Phi^{\prime}(t)}{\Phi(t)}$

Now, what I'm really trying to ask, is what these equations are telling us about the meaning of the cumulant generating function. Again, I understand how the cumulants are determined, how they relate to the moments, why the generating function was defined this way, etc. What I don't understand is if there's a simple interpretation of either $\Psi(t)$ or $\Psi^{\prime}(t)$ at any given value of $t$. Is it valid to think of $\Psi(t)$ as a spectral decomposition of a second hypothetical probability distribution that has moments equal to the cumulants of the original distribution? Thanks for any answers!

1

There are 1 best solutions below

7
On

For simplicity let us assume that $X$ has mean zero, so I don't accidentally say something obviously wrong by mixing up cumulant and moment.

A few basic comments:

You can look at $\Psi(z)=E[e^{zX}]$ for a complex parameter $z$. This unifies the notion of the characteristic function (which is the restriction of $\Psi$ to the imaginary axis) and the cumulant generating function (which is the restriction of $\Psi$ to the real axis).

This unified object $\Psi$ is really the "Fourier transform" of the (formal) density of $X$. So they are really the same object, the only issue is that often the domain of $\Psi$ doesn't contain the real axis but it is always guaranteed to contain the imaginary axis.

The term "generating function" should really already be alluding to the fact that the cumulant generating function is a tool, not really an object of interest per se. In general generating functions are used as methods for studying the coefficients of their (perhaps formal) power series, and are not of much interest in and of themselves.

With that said, the most direct interpretation of the cumulant generating function per se that I can think of comes from Cramer's theorem. This loosely says that if $X_i$ are iid random variables with a cumulant generating function, and $n$ is large, then the probability that $|\sum_{i=1}^n X_i|>nx$ is approximately $e^{-nI(x)}$. Here $I(x)$ is called the rate function and is given explicitly by the Legendre transform of the logarithm of the cumulant generating function:

$$I(x)=\sup_{t \in \mathbb{R}} tx-\ln \Psi(t).$$

Notice that this supremum, if it is finite, will be attained where $(\ln \Psi)'(t)=x=\Psi'(t)/\Psi(t)$. Thus in effect we can look at $\psi=(\Psi'/\Psi)^{-1}$, and then $I(x)=x\psi(x)-\ln \Psi(\psi(x))$ (on the domain of $\psi$, anyway). $\Psi'/\Psi$ is guaranteed to be injective (but not surjective) because $\Psi$ is log-convex.

But $I$ has a relatively concrete interpretation as measuring the decay rate of large deviations, so this gives us a way of thinking about $\Psi$ and $\psi$.

An instructive example comes when you consider $X_i$ equally likely to be $-1$ or $1$; in this case $\Psi=\cosh$ and $\psi=\tanh^{-1}$, so that $I(x)=x\tanh^{-1}(x)+\frac{1}{2}\log(1-x^2)$ in $(-1,1)$ (extended by continuity to $-1$ and $1$, where the value is easily seen by simple counting considerations to be $\log(2)$.) This gives us the exponential decay of the tail behavior of the sum.

But neither term really expresses it properly in isolation. For instance, notice that the two terms cancel out their respective singularities at $\pm 1$, so there is no hope of understanding the behavior there without both terms. To put it another way, quantitatively understanding the tail is requiring us to know not really how $\psi$ and $\Psi$ behave by themselves but how much $id \cdot \psi$ differs from $\ln \circ \Psi \circ \psi$. That can't possibly be encapsulated in a single value of $\Psi$, at the very least you need to know $\Psi$ on some interval to get this information.