Estimate Poisson parameter

199 Views Asked by At

1. Background:

Given a parameter $k\in\mathbb{R}^+$.

I have a bunch of positive data pairs $\{(a_i,b_i)|a_i\in\mathbb{R}^+, b_i\in\mathbb{R}^+\}_{i=1}^N$.

For each pair $(a_i,b_i)$, an observation value $y_i$ is generated by:

$$ y_i=\frac{1}{k}(A_i-B_i),~~~~A_i\sim \text{Pois}(ka_i),~~~~B_i\sim \text{Pois}(kb_i), $$

where $A_i$ and $B_i$ are independent.

In other words, each $y_i$ is generated by the following three steps:

$(1.1)$ Get $a_i$ and randomly pick a value for $A_i$ from $\text{Pois}(ka_i)$.

$(1.2)$ Get $b_i$ and randomly pick a value for $B_i$ from $\text{Pois}(kb_i)$.

$(1.3)$ Perform $y_i=\frac{1}{k}(A_i-B_i)$.


2. My Question:

Now I have a bunch of data points and observation values $\{(a_i,b_i,y_i)\}_{i=1}^N$.

How can I estimate the unknown latent parameter $k\in\mathbb{R}^+$ (globally unified)?


3. My Efforts:

$(3.1)$ I may know that the expected value and variance of each $y_i$ can be calculated by:

\begin{align} E(y_i)&=\frac{1}{k}[E(A_i)-E(B_i)]=\frac{1}{k}(ka_i-kb_i)=a_i-b_i,\\ D(y_i)&=\frac{1}{k^2}[D(A_i)+D(B_i)]=\frac{1}{k^2}(ka_i+kb_i)=\frac{1}{k}(a_i+b_i). \end{align}

$(3.2)$ There are actually $N$ independent nested distributions. For the $i$-th observation, I can not calculate the variance (since there is only one observation $y_i$) to perform maximum likelihood estimation (MLE) or moment estimation (ME) with using $D(y_i)=\frac{1}{k}(a_i+b_i)$.

$(3.3)$ I am confused by how to find a best $k$ that can fit the data points and observations well, by starting from some convincing ideas like MLE or ME?

$(3.4)$ Since there may be actually not only one distribution, but exist $N$ compounded distributions, and each compounded distribution correspond to only one data point and observation $(a_i,b_i,y_i)$, I have no idea about finding the best $k$.


4. Update (1):

I start from the idea of maximum likelihood estimation (MLE).

First, let $Y_i$ be a random variable corresponding to $y_i$, then I have:

\begin{align} P(Y_i=s)&=\sum_{n=0}^\infty P(A_i=n+ks)P(B_i=n)\\ &=\sum_{n=0}^\infty \frac{(ka_i)^{n+ks}e^{-ka_i}}{(n+ks)!}\times \frac{(kb_i)^{n}e^{-kb_i}}{n!}\\ &=(ka_i)^{ks}e^{-k(a_i+b_i)}\sum_{n=0}^\infty \frac{(k^2a_ib_i)^n}{n!(n+ks)!}. \end{align}

From Wolfram, I have:

\begin{align} P(Y_i=s)&=(ka_i)^{ks}e^{-k(a_i+b_i)}\sum_{n=0}^\infty \frac{(k^2a_ib_i)^n}{n!(n+ks)!}\\ &=(ka_i)^{ks}e^{-k(a_i+b_i)} (k^2a_ib_i)^{-\frac{ks}{2}}I_{ks}(2k\sqrt{a_ib_i})\\ &=(\frac{a_i}{b_i})^{\frac{ks}{2}}e^{-k(a_i+b_i)}I_{ks}(2k\sqrt{a_ib_i}), \end{align}

where $I_{ks}(\cdot)$ is the modified Bessel function of the first kind.

Using the idea of MLE, I want to solve the following problem:

\begin{align} \hat{k}&=\underset{k}{\arg\max}~\prod_{i=1}^N{P(Y_i=y_i|k)}\\ &=\underset{k}{\arg\max}~\sum_{i=1}^N{lnP(Y_i=y_i|k)}\\ &=\underset{k}{\arg\max}~\sum_{i=1}^N{ln\left[(\frac{a_i}{b_i})^{\frac{ky_i}{2}}e^{-k(a_i+b_i)}I_{ky_i}(2k\sqrt{a_ib_i})\right]}\\ &=\underset{k}{\arg\max}~\sum_{i=1}^N{\frac{ky_i}{2}ln(\frac{a_i}{b_i})-k(a_i+b_i)+ln\left[I_{ky_i}(2k\sqrt{a_ib_i})\right]}\\ &=\underset{k}{\arg\max}~\sum_{i=1}^N{k\left[\frac{y_i}{2}ln(\frac{a_i}{b_i})-(a_i+b_i)\right]+ln\left[I_{ky_i}(2k\sqrt{a_ib_i})\right]}. \end{align}

Due to the existence of $ln\left[I_{ky_i}(2k\sqrt{a_ib_i})\right]$, I am stucked and think that it is too hard for me to get the derivative and the optimal $k$.

The derivative of the above infinite series is also quite complicated: Link.

After a long struggle, I am still failed to get the answer.

I am not sure if the above steps are all correct.

Could you please figure it out for me?


5. Update (2):

Following the suggestion from @BinxuWang王彬旭, I write the following Python program for simulation:

import numpy as np
from scipy.stats import skellam

data_len = 80
max_value = 1.0
A = np.random.rand(data_len,) * max_value 
B = np.random.rand(data_len,) * max_value 
k_gt = 100.0

Y_observed = (np.random.poisson(k_gt * A) - np.random.poisson(k_gt * B)) / k_gt

for k in range(9):
    k = 10 ** (k - 4)
    Y_k = Y_observed * k
    mu1 = A * k
    mu2 = B * k
    L = np.sum(np.log(skellam._pmf(Y_k, mu1, mu2)))  # L = sum(log(...))
    print(k, L)

The output is:

0.0001 -0.031569807095909126
0.001 -0.2549938771522239
0.01 -1.9421743846032573
0.1 -13.274040574073975
1 -66.80138745123028
10 -162.06597061912214
100 -278.81300414250535
1000 nan
10000 nan

The true value of $k$ is $100$.

However, I see that the optimal $k$ given by this program will always be zero, since the likelihood evaluation value may monotonically decrease in $(0,+\infty)$. And even when $k=100$, I can not see that it peaks the maximal value.

Similar phenomenons are observed with various combinations of data_len, max_value and k_gt.

Is my implementation wrong, or I missed something?


6. Update (3):

It seems that the following approach inspired by a comment from Zhihu works.

enter image description here

Let $Z_i=\frac{Y_i-(a_i-b_i)}{\sqrt{a_i+b_i}}$, then we have $E(Z_i)=0$ and $D(Z_i)=\frac{1}{k}$.

Although different $Z_i$s might not obey the same latent population distribution, I write the following program for test:

import numpy as np

data_len = 1000000
max_value = 1e3

for i in range(10):
    k_gt = np.random.rand(1,) * 1e4

    A = np.random.rand(data_len,) * max_value 
    B = np.random.rand(data_len,) * max_value 

    Y_observed = (np.random.poisson(k_gt * A) - np.random.poisson(k_gt * B)) / k_gt

    z = (Y_observed - (A - B)) / ((A + B) ** 0.5)
    k_hat = 1 / z.var()
    #print('E(z) =', z.mean())
    #print('D(z) =', z.var())
    print('True k =', k_gt)
    print('Estimated k = ', k_hat)
    print('===')

The output is:

True k = [2298.1854775]
Estimated k =  2300.5718461385372
===
True k = [7922.65659395]
Estimated k =  7917.124293172064
===
True k = [263.85265814]
Estimated k =  263.6420801776157
===
True k = [6645.19567678]
Estimated k =  6648.825624909179
===
True k = [9224.97551002]
Estimated k =  9221.90494158757
===
True k = [5343.45415427]
Estimated k =  5343.7859362020945
===
True k = [3711.94735707]
Estimated k =  3722.572332730343
===
True k = [5075.17847329]
Estimated k =  5071.739140612038
===
True k = [4805.75795587]
Estimated k =  4806.967854078318
===
True k = [1378.56419652]
Estimated k =  1379.657342920305
===

I think this method may be sensible since it gives estimation $\hat{k}$ quite close to the latent true $k$. And when the sample number data_len increases, the estimation becomes more accurate.

1

There are 1 best solutions below

7
On BEST ANSWER

If we'd like to use MLE, we need to evaluate the probability of $y_i$ given $a_i$ $b_i$ and $k$, i.e. have a probabilistic generative model of $y_i$. So here is a brute force numerical way to do so.

The difference of two Poisson variables is distributed as Skellam_distribution. The two parameters of Skellam distr. are $\mu_1=ka_i$ and $\mu_2=kb_i$, and the observed value is $ky_i$. Then we could insert these three numbers into a package evaluating the pmf of the Skellam distribution $$ \mathcal L(k|\{a_i,b_i,y_i\})=\prod_i P_{skellam}(ky_i;ka_i,kb_i) $$ Then we could try optimizing $\mathcal L(k)$ using numerical optimization routines.

The hard part of this optimization is that $ky_i$ needs to be an integer to have a non-zero probability mass. otherwise, the likelihood vanishes. So, this is actually a discrete optimization problem, not amenable to gradient descent.

One way to tackle this is we propose a set of feasible solutions $K=\{k|ky_i\in\mathbb Z,\forall i\}$. This will be an infinite set, since $k\in K\Rightarrow nk\in K,\forall n\in\mathbb Z$.

We may perform gradient-like optimization (local search / hill climbing) in this set...


An update, actually the analytical form of likelihood (below) is continuous to $ky_i$ (non-integer).

I checked the implementation of PMF in the scipy, actually we can evaluate the likelihood formula anywhere without $ky_i$ being an integer.

The vanishing pmf at non integer value is enforced by the code, not the math. We just need to use skellam._pmf instead of skellam.pmf to circumvent the integer check by scipy.

In this way, we could do numerical optimization on this smoother landscape. Though the distribution makes sense only when $ky_i$ are integers.

from scipy.stats import skellam
In [23]: skellam.pmf(0.0,2,2)
Out[23]: 0.20700192122398667
In [24]: skellam._pmf(0.0,2,2)
Out[24]: array(0.20700192)
In [25]: skellam.pmf(0.1,2,2)
Out[25]: 0.0
In [26]: skellam._pmf(0.1,2,2)
Out[26]: array(0.20667417)

Analytically, the PMF of Skellam distr. is $$ p(n;\mu_1,\mu_2)=\exp(-(\mu_1+\mu_2))(\frac{\mu_1}{\mu_2})^{n/2} I_{|n|}(2\sqrt{\mu_1\mu_2}) $$ With $I_{|n|}$ the Bessel function. $$ P(ky_i;ka_i,kb_i)=\exp(k(a_i+b_i))(\frac{a_i}{b_i})^{\frac {ky_i} {2}}I_{|ky_i|}(2\sqrt{k^2a_ib_i}) $$