How to scale a negative binomial distribution?

611 Views Asked by At

I posted a variation of this question on Cross-validated, but did not get any answer, so I hope someone can help me over here.

A bit of background first. I have implemented a neural network for time series forecasting. The network outputs the parameters (mean $\mu$ and dispersion $\theta$) of a negative binomial distribution

$$\Pr(X = x) = \binom{x+\theta-1}{x} \left(\frac{\mu}{\theta + \mu}\right)^\theta \left(\frac{\theta}{\theta + \mu}\right)^x$$

To ease with model training, I want to scale the input data (i.e., divide by $k$ the past timesteps fed to the network) and then remove the scaling effect on the predicted distribution parameters. If my network was outputting the mean and variance of a Gaussian distribution, I would multiply the predicted mean and variance by $k$ and $k^2$, respectively. However, I am not sure how to do this for a negative binomial.

After running some experiments, the way to do it that I came up with (see the Cross-validated question) does not seem right. In the DeepAR paper (p. 5), the authors multiply the mean and dispersion by $k$ and $\sqrt{k}$, respectively. But I do not know where multiplying the dispersion by $\sqrt{k}$ comes from.

Moreover, after reading this question I wonder if there is actually a solution to what I want to do.

I would appreciate any help.