I want to find a way of measuring a parameter in a changing attachment probability of an urn model given the draw sequence. I hope this is the right place to ask these kind of questions, I apologize if not. Here is the model and my approch:
I have an urn model with N different balls. I draw from this urn with replacement. The probability for each ball i to be drawn at drawstep $\tau$ is
$ p_i = \frac{f(k_{i, \tau}, t_{i, \tau}, \alpha_i)}{\sum_{j=1}^N f(k_{j, \tau}, t_{j, \tau}, \alpha_j)}$
where each ball i has its weight (or multiplicity) $f_i = f(k_{i, \tau}, t_{i, \tau}, \alpha_i) = (k_{i, \tau}+1)(t_i + 1)^{-\alpha_i}$
Where $k_{i, \tau}$ is the number of times ball $i$ has been drawn until draw step $\tau$, t is some integer value between 1 and 120 that is increasing by 1 after each ~100000 draw steps and $\alpha_i$ is a constant.
Now i am interested only in one special ball i=m. Specifically I want to measure the exponent $\alpha_m$ of that ball. I have simulation data, in which I have ~7e5 balls and 9e6 draw steps. All $k_{i, \tau}$, $t_{i, \tau}$ and $\alpha_i$ are known to me, except for $\alpha_m$. I know the exact draw sequence. The special ball m is drawn quite often in this sequence, about 5e3 times.
My approach goes like that: The best estimator for the number of draw steps between two events where m is drawn should be one over the probability to draw m. So if we define C as the number of draws between two events where m is drawn we have:
\begin{equation} C \approx \frac{1}{p_m} = \frac{\sum_i f_{i, \tau}}{(k_{m, \tau}+1)(t_{m, \tau} + 1)^{-\alpha_m}} \end{equation}
And thus:
\begin{equation} \label{eq_alpha} \alpha_m \approx ln(\frac{C (k_{m, \tau}+1)}{\sum_i f_{i, \tau}}) \frac{1}{ln(t_{m, \tau}+1)} \end{equation}
Now technically this is not exactly correct as $f_{i, \tau}$ changes at each draw step because the $k_i$ are increased by 1 every time ball $i$ is drawn. However, since our special ball m is drawn much more often, this effect is quite small in the sum. I confirm this by taking the sum at different times in the draw sequence between the two events. And of course the measured $\alpha_m$ is very noisy. However, since the ball m is drawn 5e3 times, we get 5e3 measurements for $\alpha$ from which we can average.
Now if I simulate the system with perfect knowledge of everything including $\alpha_m$ and I then measure the mean over $\alpha_m$ on the simulated system, I get a value that is systematically lower then the true $\alpha_m$. This leaves me puzzled if my approach is valid and if the mean over all measurements of $\alpha_m$ is the best approximation for the true $\alpha_m$. Do you have any ideas or suggestions?