Let $ X_1, \ldots, X_k $ be independent random variables such that $ X_1 \sim B(N_1, p), \ldots, X_k \sim B(N_k, p) $. I have exactly one realization for each, i. e. a sequence $ x_1, \ldots, x_k $ where $ x_i $ is a realization of $ X_i $. (We could generalize, but let's keep things simple, since I don't need anything more complicated.) I know $ N_1, \ldots, N_k $, but I'm trying to estimate $ p $. I'm thinking that using $ \frac{1}{k}\sum_{i=1}^{k} x_i/N_i $ would not be ideal, because the $ X_i $'s don't necessarily have the same variance. My intuition is that this estimator is unbiased but has a larger variance than necessary. Thus, I'm thinking that I should use a weighted average to estimate $ p $, where $ x_i/N_i $ is given more weight than $ x_j/N_j $ if $ N_i < N_j $ because then $ \sigma_i^2 = N_ip(1 - p) < \sigma_j = N_jp(1 - p) $. However, I'm not sure exactly what's the estimator I should use, so I would appreciate some help.
2026-04-01 23:56:29.1775087789
On
Estimating $ p $ for a sequence of random variables $ X_1 \sim B(N_1, p), \ldots, X_k \sim B(N_k, p) $ when I have one realization for each
94 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
3
On
One way is to use maximum likelihood formulation of the problem: $$ \hat p=\arg\max_{p\in[0,1]}P(X_1=x_1,\dots,X_k=x_k\mid p). $$ Using the independence of $X_i$'s we have: \begin{split} P(X_1=x_1,\dots,X_k=x_k\mid p)&=\prod_{i=1}^k P(X_i=x_i\mid p)\\ &=\prod_{i=1}^k\binom{N_i}{x_i}p^{x_i}(1-p)^{N_i-x_i}. \end{split} The maximum likelihood formulation is then equivalent to: $$ \hat p=\arg\max_{p\in[0,1]}\prod_{i=1}^k\binom{N_i}{x_i}p^{x_i}(1-p)^{N_i-x_i} $$ which is : $$ \hat p=\arg\max_{p\in[0,1]}\sum_{i=1}^k({x_i}\log p+({N_i-x_i})\log(1-p)). $$ Solving this will give: $$ \hat{p}=\frac{\sum_{i=1}^k x_i}{\sum_{i=1}^k N_i}. $$ As you can check the estimator is unbiased.
Do you know how to show that $X=X_1+\cdots+X_k\sim \operatorname{Binomial}(N_1+\cdots+N_k, p)\text{?}$
If so you have one observation, $X\sim\operatorname{Binomial}(N,p)$ where $N=N_1+\cdots+N_k.$
The conditional distribution of $X_1,\ldots,X_k$ given the value of their sum $X$ can be shown not to depend on $p.$ Therefore learning the values of $X_1,\ldots,X_k$ after knowing $X$ does not give you more information about $p$ in addition to what you had when you only knew $X.$