Moment Generating Function (MGF) of Hypergeometric Distribution is No Greater Than MGF of Binomial Distribution with the Same Mean

Question

Moment Generating Function (MGF) of Hypergeometric Distribution is No Greater Than MGF of Binomial Distribution with the Same Mean

7.3k Views Asked by Bumbble Comm At 27 Mar 2026 - 7:50

The Setup

Consider a hypergeometric distribution $X$ with parameters $N, n, m,$ i.e. $$\mathbb{P}[X = k] = \frac{{m \choose k} {N-m \choose n-k}}{{N \choose n}},$$ for $k$ running from $0$ to $\min\{n,m\}.$ The mean $\mathbb{E}[X]$ is $\frac{mn}{N}.$ Let $Y$ be a $\mathsf{Bi}(m,n/N)$ distribution, i.e. $$\mathbb{P}[Y = k] = {m \choose k}\left(\frac{n}{N}\right)^k \left(1-\frac{n}{N} \right)^{m-k}.$$ $Y$ also has mean $\frac{mn}{N}.$

(In $X,$ the roles of $m$ and $n$ are interchangeable, so the following should hold for $\mathsf{Bi}(n,m/N)$ also.) In Janson, Łuczak, and Ruciński's Random Graphs (JLR), the proof of Theorem 2.10 (page 29 in my edition) is largely left as an exercise. It depends on the following fact:

The Question

$$(1) \ \ \forall u \in \mathbb{R}, \mathbb{E}[e^{uX}] \leq \mathbb{E}[e^{uY}].$$ That is, the MGF of $X$ is less than or equal to the MGF of $Y.$ Intuitively, this makes sense, since $Y$ seems more likely than $X$ to be far from its mean. How can we prove this?

What I've Tried

JLR references Hoeffding 1963 for this result. I went through the parts of that paper that seemed most relevant to this problem--"Sampling from a Finite Population" and some of "Sums of Dependent Random Variables", but I was unable to find anything that proves $$\forall u \in \mathbb{R}, \mathbb{E}[e^{uX}] \leq \mathbb{E}[e^{uY}].$$ (I skimmed the rest of the paper as well.) One more remark about Hoeffding: when this paper talks about $\mathbb{E}[e^{cZ}]$ it seems to usually require $c>0$, whereas our desired result holds for all $u \in \mathbb{R}.$

I have tried writing out explicit expressions for these MGFs: $$\mathbb{E}[e^{uX}] = \sum_{k} \frac{{m \choose k} {N-m \choose n-k}}{{N \choose n}} e^{uk}$$ and $$\mathbb{E}[e^{uY}] = \sum_{k} {m \choose k} \left(\frac{n}{N} \right)^k \left(1-\frac{n}{N}\right)^{m-k} e^{uk}.$$ These summands share factors of ${m \choose k}$, and $e^{uk} > 0,$ so one might hope to prove $$(2) \ \ \forall k \leq m,n \leq N, \ \ \frac{{N - m \choose n-k}}{{N \choose n}} \leq \left(\frac{n}{N} \right)^k \left(1 - \frac{n}{N} \right)^{m-k}.$$ $(2)$, if true, would prove $(1), \mathbb{E}[e^{uX}] \leq \mathbb{E}[e^{uY}].$ I currently do not believe this inequality is true, based on some numbers I put into Maple (but I may have made an error there). In any case, although this inequality would be sufficient, it is not necessary for proving (1).

I also tried expanding $e^{uX} = 1 + uX + \frac{u^2 X^2}{2} + \cdots,$ and similarly for $e^{uY}.$ For $u \geq 0,$ (probably using Fubini-Tonelli somewhere), it would suffice to prove $$(3) \ \ \forall K \in \mathbb{N}, \ \ \mathbb{E}[X^K] \leq \mathbb{E}[Y^K].$$ I do not know how to do this (although it seems true), and in any case it wouldn't directly prove (1) when $u < 0.$

Original Q&A

There are 2 best solutions below

Bumbble Comm On 17 Oct 2021 - 10:03

Let $N \in \mathbb{Z}_{>0}.$ Notation: $[N] = \{1,2,...,N\}.$ Let $C = \{c_1, ..., c_N\}$ be a set of variables, and $\mathbb{R}^C$ the vector space of $\mathbb{R}$-linear combinations of $c_j$s.

We introduce the following notation: for $1 \leq n \leq N,$

$$C^{\underline{n}} = \{(x_1, ..., x_n) \in C^n : \forall i, j \in [n], i \neq j \implies x_i \neq x_j \}.$$

That is, $C^{\underline{n}}$ is the set of ordered $n$-tuples from $C$ with all distinct elements. (For $N \in \mathbb{R},$ we use the different definition $N^{\underline{n}} := N (N-1) \cdots (N-n+1)$; take caution not to confuse the two usages. $\# [N]^{\underline{n}} = N^{\underline{n}}$.)

Next we define $C_{\oplus n}$: $$\mathbb{R}^C \supset C_{\oplus n} := \{x_1 + \cdots + x_n : \text{ each } x_i \in C\}.$$ Let $f : C_{\oplus n} \overset{\sim}{\to} \Gamma$ be a bijection from $C_{\oplus n}$ to another set of variables, $\Gamma.$

For $\gamma \in \Gamma$ and $v \in \mathbb{R}^\Gamma,$ $\langle \gamma, v \rangle$ is the coefficient of $\gamma$ in the standard-basis expansion of $v.$ Consider the following vectors in $\mathbb{R}^\Gamma$:

$$w := N^{-n} \sum_{x \in C^n} f(x_1 + \cdots + x_n)$$

and

$$u := \frac{1}{N^{\underline{n}}} \sum_{x \in C^{\underline{n}}} \left( \sum_{k=1}^n \sum_{\substack{r \in [n]^k, \\ r_1 + \cdots + r_k = n}} \sum_{i \in [n]^k} f(r_1 x_{i_1} + \cdots + r_k x_{i_k})\right).$$

Lemma 1. Let $k \in [n], i,i' \in [n]^{k},$ and $r \in [n]^k$ such that $r_1 + \cdots + r_k = n.$ Let $y_1 = (y_{1,1},...,y_{1,n})$ and $y_2 = (y_{2,1},...,y_{2,n}) \in C^{\underline{n}}.$ Define:

$$\gamma_1 := f(r_1 y_{1, i_1} + \cdots + r_k y_{1, i_k})$$

and

$$\gamma_2 := f(r_1 y_{2, i'_1} + \cdots + r_k y_{2, i'_k}).$$ Then $\langle \gamma_1, w \rangle = \langle \gamma_2, w \rangle > 0$ and $\langle \gamma_1, u \rangle = \langle \gamma_2, u \rangle > 0.$

Proof of Lemma 1. $\langle \gamma_1, w \rangle > 0$ and $\langle \gamma_1, u \rangle > 0$ because the term $\gamma_1 = f(r_1 y_{1,i_1} + \cdots + r_k y_{1,i_k})$ appears in both sums (and all coefficients are positive). Both $w$ and $u$ are invariant under permutations of $C$. That is, for any bijection $\sigma : C \to C,$ with $$w_{\sigma} := N^{-n} \sum_{x \in C^n} f(\sigma(x_1) + \cdots + \sigma(x_n))$$ and $$u_{\sigma} := \frac{1}{N^{\underline{n}}} \sum_{x \in C^{\underline{n}}} \left( \sum_{k=1}^n \sum_{\substack{r \in [n]^k, \\ r_1 + \cdots + r_k = n}} \sum_{i \in [n]^k} f(r_1 \sigma(x_{i_1}) + \cdots + r_k \sigma(x_{i_k}))\right),$$ we have $w_\sigma = w$ and $u_\sigma = u.$ Let $\sigma : C \to C$ be a bijection sending $y_{1,i_j}$ to $y_{2,i'_j}$ for $j = 1,2,...,k.$ Using the Iverson bracket notation,

$$\langle \gamma_1, w \rangle = N^{-n} \sum_{x \in C^n} [x_1 + \cdots + x_n =f^{-1}(\gamma_1)] \\ =N^{-n} \sum_{x \in C^n} [\sigma(x_1) + \cdots + \sigma(x_n) = f^{-1}(\gamma_2)] \\ =\langle \gamma_2, w_\sigma \rangle \\ =\langle \gamma_2, w \rangle.$$

Similarly, with $\sum_{(k,r,i)'}$ referring to the sum $\sum_{k=1}^n \sum_{\substack{r \in [n]^k, \\ r_1 + \cdots + r_k = n}} \sum_{i \in [n]^k},$

$$\langle \gamma_1, u \rangle = \frac{1}{N^{\underline{n}}} \sum_{x \in C^{\underline{n}}} \left( \sum_{(k,r,i)'} [r_1 x_{i_1} + \cdots + r_k x_{i_k} = f^{-1}(\gamma_1)] \right) \\ = \frac{1}{N^{\underline{n}}} \sum_{x \in C^{\underline{n}}} \left( \sum_{(k,r,i)'} [r_1 \sigma(x_{i_1}) + \cdots + r_k \sigma(x_{i_k}) = f^{-1}(\gamma_2)] \right) \\ = \langle \gamma_2, u_\sigma \rangle \\ = \langle \gamma_2, u \rangle.\ \square$$

Given the above lemma, for any composition $r$ of $n$ (that is, $r \in [n]^k$ and $r_1+ \cdots + r_k = n$), we can define:

$$c(r) = \langle \gamma_1, w \rangle$$

and

$$s(r) = \langle \gamma_1, u \rangle,$$

for any $\gamma_1 = f(r_1 y_{i_1} + \cdots + r_k y_{i_k})$ as in the statement of Lemma 1. Lemma 1 implies $c$ and $s$ are well-defined.

Consider the function $$f^* : C^{\underline{n}} \to \mathbb{R}^\Gamma \ \ \ \ (\dagger)$$ given by $$(x_1, ..., x_n) \overset{f^*}{\mapsto} \sum_{(k,r,i)'} \frac{c(r)}{s(r)} f(r_1 x_{i_1} + \cdots + r_k x_{i_k}).$$

Then (by comparing coefficients on either side) $$w = \frac{1}{N^{\underline{n}}} \sum_{x \in C^{\underline{n}}} f^*(x_1, ..., x_n).$$

Now consider $(X_1, X_2, ..., X_n)$ a random sample from $C$ taken (uniformly and) without replacement. That is, $\forall l \in \{0,1,...,n-1\}, X_{l+1} \in C \setminus \{X_1, ..., X_l \}$ is chosen uniformly at random. Let $\mathcal{X}:=\sum_{i=1}^n X_i.$ If we substitute in the values $c_1 = c_2 = \cdots = c_m = 1$ and $c_{m+1} = \cdots = c_N = 0,$ then $\mathcal{X}$ becomes a hypergeometric random variable with parameters $N, n, m.$ Let $(Y_1, ..., Y_n)$ be a random sample from $C$ taken uniformly and independently, with replacement. Let $\mathcal{Y}:=\sum_{i=1}^n Y_i.$ If we substitute in the values $c_1 = \cdots = c_m = 1$ and $c_{m+1} = \cdots = c_N = 0,$ then $\mathcal{Y}$ becomes a binomial $\mathsf{Bi}(m,n/N)$ distribution.

Theorem 2.[Hoeffding 1963, Theorem 4] Let $f$ be real-valued and convex. Then $$\mathbb{E}[f(\mathcal{X})] \leq \mathbb{E}[f(\mathcal{Y})].$$

Corollary 3. The MGF $\mathbb{E}[e^{uX}]$ of a hypergeometric distribution $X$ with parameters $N,n,m$ is less than or equal to the MGF $\mathbb{E}[e^{uY}]$ of a binomial distribution $Y$ with the same mean.

Proof of Corollary 3. Apply Theorem 2 to the real-valued, convex function $f:\mathbb{R}^C \to \mathbb{R}$ given by first substituting $c_1 = \cdots = c_m = 1$ and $c_{m+1} = \cdots c_N = 0$ (a linear map $\mathbb{R}^C \to \mathbb{R}$), then exponentiating $z \mapsto e^{uz}$ (a convex function). $\square$

Proof of Theorem 2. We have: $$\mathbb{E}[f(\mathcal{X})] = \frac{1}{N^{\underline{n}}} \sum_{x \in C^{\underline{n}}} f(x_1 + \cdots + x_n)$$ and $$\mathbb{E}[f(\mathcal{Y})] = N^{-n} \sum_{y \in C^n} f(y_1 + \cdots + y_n).$$ With $c(r), s(r),$ and $f^*$ defined as above (see ($\dagger$)),

$$\mathbb{E}[f(\mathcal{Y})] = \frac{1}{N^{\underline{n}}} \sum_{x \in C^{\underline{n}}} f^*(x_1,..., x_n).$$

Setting $f$ to be a nonzero constant function, we get $$\sum_{(k,r,i)'} \frac{c(r)}{s(r)} = 1.$$

On the other hand, setting $f(x_1 + \cdots + x_n) = x_1 + \cdots + x_n,$ we find that $$f^*(x_1, ..., x_n) = \sum_{(k,r,i)'} \frac{c(r)}{s(r)}(r_1 x_{i_1} + \cdots + r_k x_{i_k})$$ extends to a linear map that is symmetric in the variables $x_1, ..., x_n$; all such functions take the form $K(x_1+ \cdots + x_n).$ Still using $f(x_1 + \cdots + x_n) = x_1 + \cdots + x_n$, since $$\mathbb{E}[f(\mathcal{Y})] = \mathbb{E}_{z \in C^{\underline{n}}}[f^*(z)],$$ we find that $K = 1.$

We now move back to general $f$ (real-valued, convex). Summarizing what we have found so far, $$\sum_{(k,r,i)'} \frac{c(r)}{s(r)} = 1$$ and $$\sum_{(k,r,i)'} \frac{c(r)}{s(r)} (r_1 x_{i_1} + \cdots + r_k x_{i_k}) = x_1 + \cdots + x_n.$$ Therefore, applying Jensen's inequality,

$$f(x_1 + \cdots + x_n) \leq \sum_{(k,r,i)'} \frac{c(r)}{s(r)} f(r_1 x_{i_1} + \cdots + r_k x_{i_k}) \\ = f^*(x_1, ..., x_n).$$

Taking expectations on both sides,

$$\mathbb{E}_{x \in C^{\underline{n}}}[f(x_1 + \cdots + x_n)] \leq \mathbb{E}_{x \in C^{\underline{n}}}[f^*(x_1, ..., x_n)],$$ i.e. $$\mathbb{E}[f(\mathcal{X})] \leq \mathbb{E}[f(\mathcal{Y})].\ \square$$

Remark 4. Hoeffding's Theorem 4 also requires $f$ to be continuous, but this doesn't come up in the proof, so I omit it. (I've heard that every convex function from $\mathbb{R}^n \to \mathbb{R}$ is continuous, but I haven't seen the proof.)

Remark 5. Thanks to Larry Frolov, Mihail Tarigradschi, Robert Dougherty-Bliss, and anyone else who worked on this problem with me. Special thanks to Jinyoung Park, who pointed out that Hoeffding's Theorem 4 was what I needed.

**Bumbble Comm** · Accepted Answer

Let $r^{\underline k}$ denote the falling power $r(r-1)(r-2)\dotsm (r-k+1)$.

The hypergeometric $X$ denotes the number of successes among $m$ draws without replacement from a pool with $N$ options, $n$ of which are successful. So $X^{\underline k}$ denotes the number of $k$-tuples of distinct successes among the $m$ draws. By linearity of expectation, $$ \mathbb E[X^{\underline k}] = m^{\underline k} \cdot \frac{n^{\underline k}}{N^{\underline k}} $$ as there are $m^{\underline k}$ $k$-tuples of distinct draws, and a probability of $\frac{n^{\underline k}}{N^{\underline k}}$ that all $k$ draws in a $k$-tuple are successful.

By similar reasoning, we have $$ \mathbb E[Y^{\underline k}] = m^{\underline k} \frac{n^k}{N^k} $$ since here, the probability that all $k$ draws in a $k$-tuple are successful is just $(\frac nN)^k$.

Comparing these: we have $\frac{n-k}{N-k} \le \frac{n-k+1}{N-k+1} \le \dots \le \frac{n-1}{N-1} \le \frac nN$, so in particular $\frac{n^{\underline k}}{N^{\underline k}} \le \frac{n^k}{N^k}$, and $\mathbb E[X^{\underline k}] \le \mathbb E[Y^{\underline k}]$.

The $k^{\text{th}}$ derivative of $\mathbb E[z^X]$ with respect to $z$ is $\mathbb E[X^{\underline k} z^{X-k}]$, and so in particular the $k^{\text{th}}$ derivative at $1$ is just $\mathbb E[X^{\underline k}]$. Therefore when we compare the Taylor expansion of $\mathbb E[z^X]$ and $\mathbb E[z^Y]$ around $z=1$, the former always has smaller coefficients than the latter.

We conclude that $\mathbb E[z^X] \le \mathbb E[z^Y]$ for $z \ge 1$; in other words, $\mathbb E[e^{uX}] \le \mathbb E[e^{uY}]$ for $u \ge 0$.

For $u\le 0$, let $v=-u$, and note that $m-X$ and $m-Y$ are hypergeometric and binomial, respectively, with the same relationship as $X$ and $Y$. So we have $\mathbb E[e^{v(m-X)}] \le \mathbb E[e^{v(m-Y)}]$; dividing by the constant $e^{vm}$, we get $\mathbb E[e^{-vX}] \le \mathbb E[e^{-vY}]$.

Moment Generating Function (MGF) of Hypergeometric Distribution is No Greater Than MGF of Binomial Distribution with the Same Mean

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in COMBINATORICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in HYPERGEOMETRIC-FUNCTION

Related Questions in PROBABILISTIC-METHOD

Trending Questions

Popular # Hahtags

Popular Questions