Problem: Let $X_1,...,X_n$ be indep. r.v.'s that satisfy, for $i = 1,...,n$, $E(X_i) = \mu_i(\theta)$ & $\mathrm{Var}(X_i)= \sigma_i^2(\theta)$.
$\theta$ is the parameter of interest and the functional forms $\mu_i(.)$ and $\sigma_i^2(.)$ are known. Among all unbiased estimating equations of the form
$\hspace{15mm}\sum_{i=1}^{n}g_i(\theta)(X_i-\mu_i(\theta))=0$ $(*)$,
what is the form of the optimal $g_i(\theta)$ that minimizes the asymptotic variance of the solution $\hat{\theta}$ of $(*)$?
It may be assumed that $\hat{\theta}$ is the unique solution of $(*)$ and it is $\sqrt{n}$ consistent for $\theta$ and asymptotically normal.
This question is quite difficult. Any insight would be much appreciated.
Here are two approaches. I am not claiming optimality. There may be better ways to treat this problem.
Suppose we know $\theta$ is in some interval $\mathcal{Z}$ (possibly being the set of all real numbers). Assume that each $\mu_i(\theta)$ function is differentiable and is either strictly increasing or strictly decreasing in $\theta$. Without loss of generality, assume each one is strictly increasing (else, just define $Y_i =-X_i$). Define $\beta_i$ as the min slope of $\mu_i(\theta)$ over $\theta \in \mathcal{Z}$, and assume $\beta_i>0$ for all $i$.
Strategy 1: $g_i(\theta) = \beta_i/\sigma_i^2(\theta)$
Strategy 2: $g_i(\theta) = \mu'_i(\theta)/\sigma_i^2(\theta)$
The intuition behind strategy 2 is that it attempts to find a more accurate value of the slope about the point $\theta$ of interest, useful when our guess $\hat{\theta}$ is already pretty close to the true answer. With no information about the slope, we need to use the global min $\beta_i$. Below I discuss how I come up with strategy 1.
Estimation equation and assumptions
Your estimator finds $\hat{\theta} \in \mathcal{Z}$, determined from $X = (X_1, \ldots, X_n)$, as the solution to the following estimation equation:
$$ \sum_{i=1}^n g_i(\hat{\theta})(X_i - \mu_i(\hat{\theta}))=0 $$
For $X=(X_1, \ldots, X_n)$, and $z \in \mathcal{Z}$, define: $$ h(X,z) = \sum_{i=1}^ng_i(z)(X_i-\mu_i(z)) $$
The estimation equation is equivalent to $h(X, \hat{\theta})=0$.
Assumption 1:
Assume $g_i(z)\geq 0$ for all $i\in\{1, \ldots, n\}$ and all $z\in\mathcal{Z}$.
Assumption 2:
Assume that for all possible $X = (X_1, \ldots, X_n)$, the function $h(X,z)$ is nonincreasing in $z$, so that $z_1\leq z_2$ implies $h(X,z_1)\geq h(X,z_2)$.
Development of strategy 1:
Define $\hat{\theta}$ as the solution to the estimation equation, and define $\delta = \hat{\theta} - \theta$ as the error. Fix $\epsilon>0$ and define : $$Y = \sum_{i=1}^ng_i(\theta+\epsilon)X_i$$ If $\delta\geq\epsilon$ then: \begin{eqnarray} 0 &=& h(X,\hat{\theta})\\ &=& h(X,\theta + \delta) \\ &\leq& h(X,\theta + \epsilon)\\ &=& \sum_{i=1}^ng_i(\theta+\epsilon)(X_i-\mu_i(\theta+\epsilon))\\ &\leq&\sum_{i=1}^ng_i(\theta+\epsilon)(X_i - \mu_i(\theta))-\epsilon\sum_{i=1}^ng_i(\theta+\epsilon)\beta_i\\ &=&Y - E[Y] - \epsilon\sum_{i=1}^ng_i(\theta+\epsilon)\beta_i \end{eqnarray} where the first inequality used the nonincreasing assumption, and the second uses the fact that $\beta_i$ is the min slope of $\mu_i(\theta)$. Thus: $$ Pr[\delta \geq \epsilon] \leq Pr\left[Y \geq E[Y] + \epsilon\sum_{i=1}^ng_i(\theta+\epsilon)\beta_i\right] $$ By the Chebyshev inequality it holds that:
\begin{eqnarray} Pr\left[Y\geq E[Y] + \epsilon\sum_{i=1}^ng_i(\theta+\epsilon)\beta_i\right] &\leq& \frac{Var(Y)}{\epsilon^2(\sum_{i=1}^ng_i(\theta+\epsilon)\beta_i)^2}\\ &=& \frac{\sum_{i=1}^ng_i(\theta+\epsilon)^2\sigma_i(\theta)^2}{\epsilon^2(\sum_{i=1}^ng_i(\theta+\epsilon)\beta_i)^2} \end{eqnarray}
A similar bound can be derived for $Pr[\delta \leq -\epsilon]$. Putting these together gives: $$ Pr[|\delta|\geq \epsilon] \leq \frac{\sum_{i=1}^ng_i(\theta+\epsilon)^2\sigma_i(\theta)^2}{\epsilon^2(\sum_{i=1}^ng_i(\theta+\epsilon)\beta_i)^2} + \frac{\sum_{i=1}^ng_i(\theta-\epsilon)^2\sigma_i(\theta)^2}{\epsilon^2(\sum_{i=1}^ng_i(\theta-\epsilon)\beta_i)^2} $$
It now makes sense to design $g_i(\theta)$ to minimize the right-hand-side of the above bound. To simplify, let's assume $\epsilon \approx 0$. This motivates a precise optimization problem defined below.
Optimization problem:
For each $\theta \in \mathcal{Z}$, we are given $\sigma_i(\theta)$ and $\beta_i$, and we must find values $g_i(\theta)$ to solve:
Minimize: $\frac{\sum_{i=1}^n g_i(\theta)^2\sigma_i(\theta)^2}{\left(\sum_{i=1}^ng_i(\theta)\beta_i\right)^2}$
Such that: $g_i(\theta) > 0 \: \mbox{for all $i\in\{1, \ldots n\}$}$.
It is clear that scaling a solution $(g_1(\theta), \ldots, g_n(\theta))$ by a positive constant $\gamma$ will not change the ratio. Thus, the problem is equivalent to:
Minimize: $\sum_{i=1}^n g_i(\theta)^2\sigma_i(\theta)^2$
Such that: $\sum_{i=1}^n g_i(\theta)\beta_i=1 \: , \: g_i(\theta)>0 \: \forall i \in \{1, \ldots, n\}$
A Lagrange multiplier approach gives the solution as $g_i(\theta) = \gamma \beta_i/\sigma_i(\theta)^2$, where $\gamma$ is any positive constant. Of course, for consistency, we hope these functions satisfy Assumption 2.
Another approach
Another approach observes $(X_1, \ldots, X_n)$ and chooses $\hat{\theta}$ to minimize a "weighted empirical variance" expression:
$$ \sum_{i=1}^nc_i(\hat{\theta})(X_i-\mu_i(\hat{\theta}))^2 $$
for some choice of $c_i(\hat{\theta})$ functions. Suppose $c_i(\hat{\theta})=1/2$ for all $i\in \{1, \ldots, n\}$. By taking a derivative, this results in the estimator that finds $\hat{\theta}$ from the equation:
$$ \sum_{i=1}^n \mu_i'(\hat{\theta})(X_i-\mu_i(\hat{\theta}))=0 $$
which has the same structure as the estimation equation in the special case when $g_i(\hat{\theta}) = \mu_i'(\hat{\theta})$.
Extended result
Actually, I can show an extended result that has almost no assumptions on the functions $g_i(z)$, $\mu_i(z)$, $\sigma_i(z)^2$ (so monotonicity is not needed). Assume:
1) The set $\mathcal{Z}$ of all possible $\theta$ is a compact set.
2) $\sum_{i=1}^n g_i(z)(X_i-\mu_i(z))=0$ has at least one solution $z \in \mathcal{Z}$ for all $n$ and all $(X_1, \ldots, X_n)$.
3) $g_i(z)$ and $\mu_i(z)$ are continuous functions over $z \in \mathcal{Z}$.
4) The variances satisfy $\sum_{i=1}^{\infty} \frac{g_i(\theta)^2\sigma_i(\theta)^2}{i^2}<\infty$
5) The sequence of estimates $\hat{\theta}_n$ solves the estimation equation for each $n \in \{1, 2, 3, \ldots\}$.
Then, with probability 1, any convergent subsequence of the estimation sequence $\{\hat{\theta}_n\}_{n=1}^{\infty}$ converges to a point $z^* \in \mathcal{Z}$ that solves the following limit equation: $$ \lim_{n\rightarrow\infty} \frac{1}{n}\sum_{i=1}^ng_i(z^*)(\mu_i(\theta)-\mu_i(z^*)) = 0 $$ In particular, if $z^*=\theta$ is the only solution to the limit equation, then the estimation sequence converges to $\theta$ with probability 1.
I wonder if this result is known, or is this new?