1. Background:
Given a parameter $k\in\mathbb{R}^+$.
I have a bunch of positive data pairs $\{(a_i,b_i)|a_i\in\mathbb{R}^+, b_i\in\mathbb{R}^+\}_{i=1}^N$.
For each pair $(a_i,b_i)$, an observation value $y_i$ is generated by:
$$ y_i=\frac{1}{k}(A_i-B_i),~~~~A_i\sim \text{Pois}(ka_i),~~~~B_i\sim \text{Pois}(kb_i), $$
where $A_i$ and $B_i$ are independent.
In other words, each $y_i$ is generated by the following three steps:
$(1.1)$ Get $a_i$ and randomly pick a value for $A_i$ from $\text{Pois}(ka_i)$.
$(1.2)$ Get $b_i$ and randomly pick a value for $B_i$ from $\text{Pois}(kb_i)$.
$(1.3)$ Perform $y_i=\frac{1}{k}(A_i-B_i)$.
2. My Question:
Now I have a bunch of data points and observation values $\{(a_i,b_i,y_i)\}_{i=1}^N$.
How can I estimate the unknown latent parameter $k\in\mathbb{R}^+$ (globally unified)?
3. My Efforts:
$(3.1)$ I may know that the expected value and variance of each $y_i$ can be calculated by:
\begin{align} E(y_i)&=\frac{1}{k}[E(A_i)-E(B_i)]=\frac{1}{k}(ka_i-kb_i)=a_i-b_i,\\ D(y_i)&=\frac{1}{k^2}[D(A_i)+D(B_i)]=\frac{1}{k^2}(ka_i+kb_i)=\frac{1}{k}(a_i+b_i). \end{align}
$(3.2)$ There are actually $N$ independent nested distributions. For the $i$-th observation, I can not calculate the variance (since there is only one observation $y_i$) to perform maximum likelihood estimation (MLE) or moment estimation (ME) with using $D(y_i)=\frac{1}{k}(a_i+b_i)$.
$(3.3)$ I am confused by how to find a best $k$ that can fit the data points and observations well, by starting from some convincing ideas like MLE or ME?
$(3.4)$ Since there may be actually not only one distribution, but exist $N$ compounded distributions, and each compounded distribution correspond to only one data point and observation $(a_i,b_i,y_i)$, I have no idea about finding the best $k$.
4. Update (1):
I start from the idea of maximum likelihood estimation (MLE).
First, let $Y_i$ be a random variable corresponding to $y_i$, then I have:
\begin{align} P(Y_i=s)&=\sum_{n=0}^\infty P(A_i=n+ks)P(B_i=n)\\ &=\sum_{n=0}^\infty \frac{(ka_i)^{n+ks}e^{-ka_i}}{(n+ks)!}\times \frac{(kb_i)^{n}e^{-kb_i}}{n!}\\ &=(ka_i)^{ks}e^{-k(a_i+b_i)}\sum_{n=0}^\infty \frac{(k^2a_ib_i)^n}{n!(n+ks)!}. \end{align}
From Wolfram, I have:
\begin{align} P(Y_i=s)&=(ka_i)^{ks}e^{-k(a_i+b_i)}\sum_{n=0}^\infty \frac{(k^2a_ib_i)^n}{n!(n+ks)!}\\ &=(ka_i)^{ks}e^{-k(a_i+b_i)} (k^2a_ib_i)^{-\frac{ks}{2}}I_{ks}(2k\sqrt{a_ib_i})\\ &=(\frac{a_i}{b_i})^{\frac{ks}{2}}e^{-k(a_i+b_i)}I_{ks}(2k\sqrt{a_ib_i}), \end{align}
where $I_{ks}(\cdot)$ is the modified Bessel function of the first kind.
Using the idea of MLE, I want to solve the following problem:
\begin{align} \hat{k}&=\underset{k}{\arg\max}~\prod_{i=1}^N{P(Y_i=y_i|k)}\\ &=\underset{k}{\arg\max}~\sum_{i=1}^N{lnP(Y_i=y_i|k)}\\ &=\underset{k}{\arg\max}~\sum_{i=1}^N{ln\left[(\frac{a_i}{b_i})^{\frac{ky_i}{2}}e^{-k(a_i+b_i)}I_{ky_i}(2k\sqrt{a_ib_i})\right]}\\ &=\underset{k}{\arg\max}~\sum_{i=1}^N{\frac{ky_i}{2}ln(\frac{a_i}{b_i})-k(a_i+b_i)+ln\left[I_{ky_i}(2k\sqrt{a_ib_i})\right]}\\ &=\underset{k}{\arg\max}~\sum_{i=1}^N{k\left[\frac{y_i}{2}ln(\frac{a_i}{b_i})-(a_i+b_i)\right]+ln\left[I_{ky_i}(2k\sqrt{a_ib_i})\right]}. \end{align}
Due to the existence of $ln\left[I_{ky_i}(2k\sqrt{a_ib_i})\right]$, I am stucked and think that it is too hard for me to get the derivative and the optimal $k$.
The derivative of the above infinite series is also quite complicated: Link.
After a long struggle, I am still failed to get the answer.
I am not sure if the above steps are all correct.
Could you please figure it out for me?
5. Update (2):
Following the suggestion from @BinxuWang王彬旭, I write the following Python program for simulation:
import numpy as np
from scipy.stats import skellam
data_len = 80
max_value = 1.0
A = np.random.rand(data_len,) * max_value
B = np.random.rand(data_len,) * max_value
k_gt = 100.0
Y_observed = (np.random.poisson(k_gt * A) - np.random.poisson(k_gt * B)) / k_gt
for k in range(9):
k = 10 ** (k - 4)
Y_k = Y_observed * k
mu1 = A * k
mu2 = B * k
L = np.sum(np.log(skellam._pmf(Y_k, mu1, mu2))) # L = sum(log(...))
print(k, L)
The output is:
0.0001 -0.031569807095909126
0.001 -0.2549938771522239
0.01 -1.9421743846032573
0.1 -13.274040574073975
1 -66.80138745123028
10 -162.06597061912214
100 -278.81300414250535
1000 nan
10000 nan
The true value of $k$ is $100$.
However, I see that the optimal $k$ given by this program will always be zero, since the likelihood evaluation value may monotonically decrease in $(0,+\infty)$. And even when $k=100$, I can not see that it peaks the maximal value.
Similar phenomenons are observed with various combinations of data_len, max_value and k_gt.
Is my implementation wrong, or I missed something?
6. Update (3):
It seems that the following approach inspired by a comment from Zhihu works.
Let $Z_i=\frac{Y_i-(a_i-b_i)}{\sqrt{a_i+b_i}}$, then we have $E(Z_i)=0$ and $D(Z_i)=\frac{1}{k}$.
Although different $Z_i$s might not obey the same latent population distribution, I write the following program for test:
import numpy as np
data_len = 1000000
max_value = 1e3
for i in range(10):
k_gt = np.random.rand(1,) * 1e4
A = np.random.rand(data_len,) * max_value
B = np.random.rand(data_len,) * max_value
Y_observed = (np.random.poisson(k_gt * A) - np.random.poisson(k_gt * B)) / k_gt
z = (Y_observed - (A - B)) / ((A + B) ** 0.5)
k_hat = 1 / z.var()
#print('E(z) =', z.mean())
#print('D(z) =', z.var())
print('True k =', k_gt)
print('Estimated k = ', k_hat)
print('===')
The output is:
True k = [2298.1854775]
Estimated k = 2300.5718461385372
===
True k = [7922.65659395]
Estimated k = 7917.124293172064
===
True k = [263.85265814]
Estimated k = 263.6420801776157
===
True k = [6645.19567678]
Estimated k = 6648.825624909179
===
True k = [9224.97551002]
Estimated k = 9221.90494158757
===
True k = [5343.45415427]
Estimated k = 5343.7859362020945
===
True k = [3711.94735707]
Estimated k = 3722.572332730343
===
True k = [5075.17847329]
Estimated k = 5071.739140612038
===
True k = [4805.75795587]
Estimated k = 4806.967854078318
===
True k = [1378.56419652]
Estimated k = 1379.657342920305
===
I think this method may be sensible since it gives estimation $\hat{k}$ quite close to the latent true $k$. And when the sample number data_len increases, the estimation becomes more accurate.

If we'd like to use MLE, we need to evaluate the probability of $y_i$ given $a_i$ $b_i$ and $k$, i.e. have a probabilistic generative model of $y_i$. So here is a brute force numerical way to do so.
The difference of two Poisson variables is distributed as Skellam_distribution. The two parameters of Skellam distr. are $\mu_1=ka_i$ and $\mu_2=kb_i$, and the observed value is $ky_i$. Then we could insert these three numbers into a package evaluating the pmf of the Skellam distribution $$ \mathcal L(k|\{a_i,b_i,y_i\})=\prod_i P_{skellam}(ky_i;ka_i,kb_i) $$ Then we could try optimizing $\mathcal L(k)$ using numerical optimization routines.
The hard part of this optimization is that $ky_i$ needs to be an integer to have a non-zero probability mass. otherwise, the likelihood vanishes. So, this is actually a discrete optimization problem, not amenable to gradient descent.
One way to tackle this is we propose a set of feasible solutions $K=\{k|ky_i\in\mathbb Z,\forall i\}$. This will be an infinite set, since $k\in K\Rightarrow nk\in K,\forall n\in\mathbb Z$.
We may perform gradient-like optimization (local search / hill climbing) in this set...
An update, actually the analytical form of likelihood (below) is continuous to $ky_i$ (non-integer).
I checked the implementation of PMF in the
scipy, actually we can evaluate the likelihood formula anywhere without $ky_i$ being an integer.The vanishing
pmfat non integer value is enforced by the code, not the math. We just need to useskellam._pmfinstead ofskellam.pmfto circumvent the integer check byscipy.In this way, we could do numerical optimization on this smoother landscape. Though the distribution makes sense only when $ky_i$ are integers.
Analytically, the PMF of Skellam distr. is $$ p(n;\mu_1,\mu_2)=\exp(-(\mu_1+\mu_2))(\frac{\mu_1}{\mu_2})^{n/2} I_{|n|}(2\sqrt{\mu_1\mu_2}) $$ With $I_{|n|}$ the Bessel function. $$ P(ky_i;ka_i,kb_i)=\exp(k(a_i+b_i))(\frac{a_i}{b_i})^{\frac {ky_i} {2}}I_{|ky_i|}(2\sqrt{k^2a_ib_i}) $$