Asymptotic distribution of U-statistic based "log-likelihood"

173 Views Asked by Bumbble Comm At 09 Apr 2026 - 5:06

This is question 4.2c from John Duchi's course Stats 300b at Stanford: http://web.stanford.edu/class/stats300b/Exercises/all-exercises.pdf

Consider the U-statistic-based "Log-likelihood" type objective:

$$L_n(\theta) = {n \choose 2}^{-1} \sum_{i,j}^n 1\{Y_i > Y_j\} \log P_{\theta}(Y_i > Y_j | x_i, x_j) $$

where we have $n$ samples of $(Y_i, x_i)$ such that:

$$Y_i = x_i^T \theta + \epsilon_i$$

$$\epsilon_i ∼ N(0, 1)$$

$$E[x_i] = 0, Cov[x_i] = \Sigma$$

Let $\hat{\theta}_n = argmax_{\theta} L_n(\theta)$ and assuming that $\hat{\theta}_n$ is consistent for the true $\theta$:

Find the asymptotic distribution of $\hat{\theta}_n$.

My approach has been (using theorem 5.23 from Asymptotic Statistics by Van der Vaart) to seek the asymptotic distribution of $\sqrt{n}(\hat\theta_n - \theta)$ which under suitable regularity conditions is asymptotically normal with mean $0$ and variance: $$V_\theta^{-1}E[\nabla L_n \nabla L_n^T]V_\theta^{-1}$$ where $V_\theta^{-1} = E[\nabla^2L_n]$ and expectation is taken with respect to data and derivatives are taken with respect to $\theta$.

My difficulty has been in evaluating these expectations to determine the variance of this estimator.

$\nabla L_n$ looks like a U-statistic with kernel $h(x_1, x_2) = 1\{Y_i > Y_j\}\nabla_\theta \log P_{\theta}(Y_1 > Y_2 | x_1, x_2)$. So, $\sqrt{n}(\nabla L_n - E[\nabla L_n])$ is asymptotically $N(0, 2^2\xi_1)$ where $\xi_1$ is the variance of :

$$\xi_1 = Cov(h(X, X_i), h(X, X_j)) = E(h(X, X_i)h(X, X_j)) - E(h(X, X_i)) E(h(X, X_j))$$

But I'm also having trouble evaluating this covariance.

EDIT:

A friend showed me a few tricks, but I dont seem much closer to having a neat closed form (if there is one).

At a closer look, $L_N$ is not a log-likelihood, so let's go after $V_\theta$ and $E[\nabla L_n \nabla L_n^T]$ to apply theorem 5.23 directly.

First, we have:

$$\log P_{\theta}(Y_i > Y_j | x_i, x_j) = \log P_{\theta}(\epsilon_j - \epsilon_i < (x_i - x_j)^T\theta | x_i, x_j) = \log \Phi \bigg(\frac{(x_i-x_j)^T\theta}{\sqrt2}\bigg)$$

Evaluating $\xi_1$ we have for the first moment:

$$E(h(X, X_i)) = E [1_i\frac{(X - X_i)\phi(\gamma_i)}{\Phi(\gamma_i)}]$$

where $\gamma_i = \frac{(X - X_i)^T\theta}{\sqrt{2}}$ and $1_i = 1\{Y > Y_i\}$.

By conditioning on the event in the indicator and applying the tower property of expectation:

$$ = E [(X - X_i)\phi(\gamma_i)] $$

which is equal to $0$ by symmetry. So,

$$\xi_1 = E [(1_i\frac{(X - X_i)\phi(\gamma_i)}{\Phi(\gamma_i)})(1_j\frac{(X - X_j)\phi(\gamma_j)}{\Phi(\gamma_j)})^T]$$

Simplifying as before:

$$= E [\phi(\gamma_i)\phi(\gamma_j)(X - X_i)(X - X_i)^T$$

Which I cannot simplify any further.

Applying similar simplifications to $\nabla^2 L_n(\theta)$ we have:

$$\nabla^2 L_n(\theta) = -\frac{\phi(\gamma_{ij})^2}{\Phi(\gamma_{ij})}(X_i-X_j)(X_i-X_j)^T/2$$ where $\gamma = \frac{(X_i-X_j)^T\theta}{\sqrt2}$.

EDIT: Using this expression to simulate the asymptotic variance, we get what we expect intuitively, this estimator works better than the least-squares estimator when the variance of the noise term $\epsilon$ is smaller than that of the covariates $x_i$. The following log-log plot shows the relationship for the 1-D case between the variance and $var(\epsilon)$ where $x$ is sampled from $N(0,1)$. The relationship is even more dramatic when $x\sim cauchy$.

An interesting followup question would be: can we find a variance stabilizing transform, such that the asymptotic variance no longer depends on the parameter in question?

Original Q&A

Asymptotic distribution of U-statistic based "log-likelihood"

Related Questions in PROBABILITY-THEORY

Related Questions in STATISTICS

Related Questions in ASYMPTOTICS

Related Questions in WEAK-CONVERGENCE

Trending Questions

Popular # Hahtags

Popular Questions