I'm currently studying discrete random variables and I am on using generating functions for calculating the expected value.
Given that a generating function is a polynomial of the following kind $g_x(s)=\sum_{k=0}^n P(X=k)s^k$, where the coefficient in front the kth power of s is the probability that the random variable X equals k, we can calculate the expected value for the discrete random variable using the derivative of the generating function at 1.
$ EX = g_x'(1) $
I understand the proof for this, but I can't seem to get the intuition behind it.
My questions are:
- What is the intuitive understanding of this? How was this result achieved?
- Is there a visual way to interpret the statement? The generating function is a polynomial, so how come the first derivative is EX and for the variance is $g''_x(1)+g'_x(1)−g'_x(1)^2$?
- If getting intuition for this statement requires more work, what prerequisite theory would you advice me to get to in order to understand it?
Thanks in advance!
MGF were developed as problem solving tools when we have linear combinations of independent random variables. This is the place where it truly shines. For a single random variable, moment generating functions (MGF) look non-intuitive and raise the question of 'how did we get here'. To understand MGF, it helps to consider the multiple variable case.
Consider an independent sequence of random variables $X_1,X_2,...,X_n$ with PDFs $g_1(x), g_2(x),... g_n(x)$. Suppose you want the PDF of the sum of the above sequence.
$Y=\sum_i X_i$
$f_Y(x)=$ a series of convolutions involving $g_i(x)$
This is a fairly typical problem in probability theory. In linear systems, we know that convolution in time domain corresponds to multiplication in frequency domain. A series of convolutions reduces to a product representation in the frequency domain. We get the frequency domain representation using the Fourier transform (or Laplace transform).
In probability theory, we do this kind of transform using an MGF ; i.e. by doing the MGF, we can replace a sequence of convolutions with a product representation.
The MGF is rather unintuitively defined as the following operation on a single variable, but you can think of it as a 'Fourier transform' operator. $\phi_X(r) = E[e^{rX}]$ (To match the MGF definition that you have, set $s=e^r$) Try to see the pdf as a time domain signal. The MGF is the corresponding frequency domain representation. To see the connection to Fourier transforms, I compare this to the standard Fourier transform in continuous and discrete time versions. You will agree that they look similar. This was the motivation according to [1].
Continuous time:
MGF: $\phi_X(r) = \int_{-\infty}^{+\infty} f_X(x)e^{rx} dx$
Fourier transform: $F_X(f) = \int_{-\infty}^{+\infty} f_X(t)e^{-j2\pi ft} dt$
In discrete time, we have the below:
MGF: $\phi_X(r) = \sum_k P[X=k] e^{rk}$
Fourier transform: $F_X(\omega)= \sum_k f_X(k)e^{-j\omega k}$
Setting $s=e^r$, we have the MGF in the form you gave. $\phi_{X,alt}(s) = \sum_k P[X=k] s^{k}$
Reference: [1] Roy Yates and David Goodman, Probability and Stochastic processes, a friendly introduction for electrical and computer engineers