People eat apples, and such events are recorded, so that we can count how many apples were eaten by a specific individual, and how many persons ate exactly $n$ apples.
If our observations are imperfect, say, each event is observed with probability $p=0.01$, how do we estimate the number of people who ate a specific number of apples?
Let
- $A_i$ be the number of apples that person $i$ ate,
- $N_k=\#\{i\,|\, A_i=k\}$ be the number of people who ate exactly $k$ apples,
- $a_i$ be the number of apples that person $i$ was observed eating,
- $n_k=\#\{i\,|\, a_i=k\}$ be the number of people who were observed eating exactly $k$ apples,
Then
$$ \mathop{\mathbb{E}} n_k = p^kN_k+\binom{k+1}1p^k(1-p)N_{k+1}+\binom{k+2}2p^k(1-p)^2N_{k+2}+\cdots $$
How would I estimate $N_k$ given $n_k$? (estimates of $N_k$ do not have to be integer)
Note that $n_k$ can be something like 100,20,8,11,6,3,0,4,0,0,2,0,0 (i.e., they wouldn't look nice).
An obvious approach would be to compute the inverse of the infinite upper-triangular left-stochastic matrix
$$ \begin{pmatrix} 1 & (1-p) & (1-p)^2 & (1-p)^3 & (1-p)^4 & (1-p)^5 & \ldots\cr 0 & p & 2p(1-p) & 3p(1-p)^2 & 4p(1-p)^3 & 5p(1-p)^4 & \ldots\cr 0 & 0 & p^2 & 3p^2(1-p) & 6p^2(1-p)^2 & 10p^2(1-p)^3 & \ldots\cr 0 & 0 & 0 & p^3 & 4p^3(1-p) & 10p^3(1-p)^2 & \ldots\cr 0 & 0 & 0 & 0 & p^4 & 5p^4(1-p) & \ldots\cr 0 & 0 & 0 & 0 & 0 & p^5 & \ldots\cr \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots\cr \end{pmatrix} $$
and multiply it by the sequence of $n_k$, padded by 0's to be infinite (and then probably divided by $p^k$ and smoothed out to make the matrix have units on the diagonal).
How do I do that? (if that's actually the way to go!)
PS. My actual question is Estimating conditional probability when events are sampled, I thought it was trivial, but got no suggestions.
Let $(M_r)_{ij}=\Big(\binom{i}{j}r^{j-i}\Big)$ be the infinite upper-triangular matrix ($\binom{i}{j}=0$ when $i>j$). Then $M_rM_s=M_{r+s}$ because $M_r=\exp(Ar)$ where $A_{ij}=\big(j\delta_{j-i-1}\big)$, i.e., $A$ has $1,2,3,\ldots$ on its 1st superdiagonal.
Let $m_k=n_kp^{-k}$, $\vec{m}=(m_0,m_1,\ldots)^T$, $\vec{N}=(N_0,N_1,\ldots)^T$.
Then the infinite system of linear equations is $\vec{m}=M_{1-p}\vec{N}$ and the solution is $\vec{N}=M_{p-1}\vec{m}$.
An interesting question remains: how to smooth out $\vec{m}$ so that the solution is numerically stable...