This is question 4.2c from John Duchi's course Stats 300b at Stanford: http://web.stanford.edu/class/stats300b/Exercises/all-exercises.pdf
Consider the U-statistic-based "Log-likelihood" type objective:
$$L_n(\theta) = {n \choose 2}^{-1} \sum_{i,j}^n 1\{Y_i > Y_j\} \log P_{\theta}(Y_i > Y_j | x_i, x_j) $$
where we have $n$ samples of $(Y_i, x_i)$ such that:
$$Y_i = x_i^T \theta + \epsilon_i$$
$$\epsilon_i ∼ N(0, 1)$$
$$E[x_i] = 0, Cov[x_i] = \Sigma$$
Let $\hat{\theta}_n = argmax_{\theta} L_n(\theta)$ and assuming that $\hat{\theta}_n$ is consistent for the true $\theta$:
Find the asymptotic distribution of $\hat{\theta}_n$.
My approach has been (using theorem 5.23 from Asymptotic Statistics by Van der Vaart) to seek the asymptotic distribution of $\sqrt{n}(\hat\theta_n - \theta)$ which under suitable regularity conditions is asymptotically normal with mean $0$ and variance: $$V_\theta^{-1}E[\nabla L_n \nabla L_n^T]V_\theta^{-1}$$ where $V_\theta^{-1} = E[\nabla^2L_n]$ and expectation is taken with respect to data and derivatives are taken with respect to $\theta$.
My difficulty has been in evaluating these expectations to determine the variance of this estimator.
$\nabla L_n$ looks like a U-statistic with kernel $h(x_1, x_2) = 1\{Y_i > Y_j\}\nabla_\theta \log P_{\theta}(Y_1 > Y_2 | x_1, x_2)$. So, $\sqrt{n}(\nabla L_n - E[\nabla L_n])$ is asymptotically $N(0, 2^2\xi_1)$ where $\xi_1$ is the variance of :
$$\xi_1 = Cov(h(X, X_i), h(X, X_j)) = E(h(X, X_i)h(X, X_j)) - E(h(X, X_i)) E(h(X, X_j))$$
But I'm also having trouble evaluating this covariance.
EDIT:
A friend showed me a few tricks, but I dont seem much closer to having a neat closed form (if there is one).
At a closer look, $L_N$ is not a log-likelihood, so let's go after $V_\theta$ and $E[\nabla L_n \nabla L_n^T]$ to apply theorem 5.23 directly.
First, we have:
$$\log P_{\theta}(Y_i > Y_j | x_i, x_j) = \log P_{\theta}(\epsilon_j - \epsilon_i < (x_i - x_j)^T\theta | x_i, x_j) = \log \Phi \bigg(\frac{(x_i-x_j)^T\theta}{\sqrt2}\bigg)$$
Evaluating $\xi_1$ we have for the first moment:
$$E(h(X, X_i)) = E [1_i\frac{(X - X_i)\phi(\gamma_i)}{\Phi(\gamma_i)}]$$
where $\gamma_i = \frac{(X - X_i)^T\theta}{\sqrt{2}}$ and $1_i = 1\{Y > Y_i\}$.
By conditioning on the event in the indicator and applying the tower property of expectation:
$$ = E [(X - X_i)\phi(\gamma_i)] $$
which is equal to $0$ by symmetry. So,
$$\xi_1 = E [(1_i\frac{(X - X_i)\phi(\gamma_i)}{\Phi(\gamma_i)})(1_j\frac{(X - X_j)\phi(\gamma_j)}{\Phi(\gamma_j)})^T]$$
Simplifying as before:
$$= E [\phi(\gamma_i)\phi(\gamma_j)(X - X_i)(X - X_i)^T$$
Which I cannot simplify any further.
Applying similar simplifications to $\nabla^2 L_n(\theta)$ we have:
$$\nabla^2 L_n(\theta) = -\frac{\phi(\gamma_{ij})^2}{\Phi(\gamma_{ij})}(X_i-X_j)(X_i-X_j)^T/2$$ where $\gamma = \frac{(X_i-X_j)^T\theta}{\sqrt2}$.
EDIT: Using this expression to simulate the asymptotic variance, we get what we expect intuitively, this estimator works better than the least-squares estimator when the variance of the noise term $\epsilon$ is smaller than that of the covariates $x_i$. The following log-log plot shows the relationship for the 1-D case between the variance and $var(\epsilon)$ where $x$ is sampled from $N(0,1)$. The relationship is even more dramatic when $x\sim cauchy$.
An interesting followup question would be: can we find a variance stabilizing transform, such that the asymptotic variance no longer depends on the parameter in question?
