Estimating Negative Binomial parameters from subsampled data.

104 Views Asked by At

Suppose I need to estimate parameters $n$ and $p$ (or alternatively $\mu$ and $\sigma^2$) for a count data, that follows Negative Binomial distribution. However, I do not observe the raw counts, but observe counts, subsampled with a known rate $q$. Basically, instead of observing a value $x$, I observe a random value from $\text{Binomial}(x, q)$ distribution.

I can find the mean value of the original distribution based on the subsampled mean as $\mu = \frac{\mu_s}{q}$. However, I cannot accurately find what would be the correspondence between variances of the original and subsampled distributions.

Please see this example in R:

n <- 20
p <- 0.8
q <- 0.1

x <- rnbinom(100000, n, p)                   # Generating N.Binom. vector.
z <- sapply(x, function(k) rbinom(1, k, q))  # Generating subsampled vector.
c(n * (1 - p) / p, mean(x), mean(z) / q)     # Correspondence between means.
# 5.000   5.003   5.007   
c(n * (1 - p) / p / p, var(x), var(z) / q)   # Variances do not match.
# 6.250   6.226   5.115   
```
1

There are 1 best solutions below

0
On BEST ANSWER

I figured it out: $$\mu_s\ =\ \frac{qn(1 - p)}{p}\ =\ q\mu,$$ $$\sigma^2_s\ =\ \frac{qn(1 - p)}{p} \cdot (p + q - pq) \ =\ q\,(p + q - pq)\,\sigma^2.$$ So, mean is indeed a simple fraction of the original mean, but variance is also multiplied by a $p + q - pq$ factor. As a consequence, in order to calculate $p$ and $n$ parameters based on the subsampled mean $m$ and variance $v$, one can use the following formula (based on the method of moments): $$n\ =\ \frac{m^2}{v - m},$$ $$p\ =\ \frac{mq}{v - m(1 - q)}.$$ My proof is a bit lengthy, but I can post it if there are any requests.