Say I have a random variable $X$ with mgf
$M_X(t) = 1 + a_1t + a_2t^2 + a_3t^3 + \cdots $
and another random variable $Y$ with a probability distribution determined by two parameters $\theta_1$ and $\theta_2$, and with mgf
$M_Y(t) = 1 + b_1(\theta_1,\theta_2)t + b_2(\theta_1,\theta_2)t^2 + \cdots $
Suppose I want to fit the parameters of the random variable $Y$ to the random variable $X$, i.e. I want to find $\theta_1$ and $\theta_2$ so that the distribution of $Y$ most closely approximates the distribution of $X$. What is the best way to do this solely by comparing the two mgfs?
My thinking so far:
The method of moments, for example, would try to solve
$b_1(\theta_1,\theta_2) = a_1\quad$ and $\quad b_2(\theta_1,\theta_2) = a_2$
(provided these equations uniquely determine $\theta_1$ and $\theta_2$). However, this can lead to large errors in higher order moments. One could otherwise try a least-square fit by choosing $\theta_1$ and $\theta_2$ that minimize
$\sum_{i=1}^{K} |b_i(\theta_1,\theta_2) - a_i|^2$
for some large $K$. My question here is whether it is better to closely fit a small number of lower order moments, or is it better to choose a large $K$ in order to spread out the fitting error across many moments? Is there some theory in statistics that studies such problems?
The following paper seems that it could help you, Link
but it is behind a paywall so I could only see the abstract.