How to infer the actual counts from the observed?

65 Views Asked by At

People eat apples, and such events are recorded, so that we can count how many apples were eaten by a specific individual, and how many persons ate exactly $n$ apples.

If our observations are imperfect, say, each event is observed with probability $p=0.01$, how do we estimate the number of people who ate a specific number of apples?

Let

  • $A_i$ be the number of apples that person $i$ ate,
  • $N_k=\#\{i\,|\, A_i=k\}$ be the number of people who ate exactly $k$ apples,
  • $a_i$ be the number of apples that person $i$ was observed eating,
  • $n_k=\#\{i\,|\, a_i=k\}$ be the number of people who were observed eating exactly $k$ apples,

Then

$$ \mathop{\mathbb{E}} n_k = p^kN_k+\binom{k+1}1p^k(1-p)N_{k+1}+\binom{k+2}2p^k(1-p)^2N_{k+2}+\cdots $$

How would I estimate $N_k$ given $n_k$? (estimates of $N_k$ do not have to be integer)

Note that $n_k$ can be something like 100,20,8,11,6,3,0,4,0,0,2,0,0 (i.e., they wouldn't look nice).

An obvious approach would be to compute the inverse of the infinite upper-triangular left-stochastic matrix

$$ \begin{pmatrix} 1 & (1-p) & (1-p)^2 & (1-p)^3 & (1-p)^4 & (1-p)^5 & \ldots\cr 0 & p & 2p(1-p) & 3p(1-p)^2 & 4p(1-p)^3 & 5p(1-p)^4 & \ldots\cr 0 & 0 & p^2 & 3p^2(1-p) & 6p^2(1-p)^2 & 10p^2(1-p)^3 & \ldots\cr 0 & 0 & 0 & p^3 & 4p^3(1-p) & 10p^3(1-p)^2 & \ldots\cr 0 & 0 & 0 & 0 & p^4 & 5p^4(1-p) & \ldots\cr 0 & 0 & 0 & 0 & 0 & p^5 & \ldots\cr \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots\cr \end{pmatrix} $$

and multiply it by the sequence of $n_k$, padded by 0's to be infinite (and then probably divided by $p^k$ and smoothed out to make the matrix have units on the diagonal).

How do I do that? (if that's actually the way to go!)

PS. My actual question is Estimating conditional probability when events are sampled, I thought it was trivial, but got no suggestions.

1

There are 1 best solutions below

0
On

Let $(M_r)_{ij}=\Big(\binom{i}{j}r^{j-i}\Big)$ be the infinite upper-triangular matrix ($\binom{i}{j}=0$ when $i>j$). Then $M_rM_s=M_{r+s}$ because $M_r=\exp(Ar)$ where $A_{ij}=\big(j\delta_{j-i-1}\big)$, i.e., $A$ has $1,2,3,\ldots$ on its 1st superdiagonal.

Let $m_k=n_kp^{-k}$, $\vec{m}=(m_0,m_1,\ldots)^T$, $\vec{N}=(N_0,N_1,\ldots)^T$.

Then the infinite system of linear equations is $\vec{m}=M_{1-p}\vec{N}$ and the solution is $\vec{N}=M_{p-1}\vec{m}$.

An interesting question remains: how to smooth out $\vec{m}$ so that the solution is numerically stable...