Optimal Rule for Rank Loss

100 Views Asked by At

In a binary classification setting, the classification risk of a classifier $h$ is defined by $$ R(h) = \mathbb{P}(Y \neq h(X)), $$ where $(X,Y) \sim P$. It is well known that the classifier that minimizes $R(h)$ over all measurable functions is the Bayes classifier, $$ h^*(x) = \mathbb{1}\{ \mathbb{E}[Y|X=x] \ge 1/2\}. $$

Another example is in the regression setting, where the `regression function' $f(x) = \mathbb{E}[Y|X=x]$ minimizes mean squared error $\mathbb{E}[(Y- f(X))^2]$.

I am interested in the 'ranking' risk $$ \rho(h) := \mathbb{P}( (Y - Y')(h(X)-h(X')) < 0), $$ where $(X,Y), (X',Y') \sim P$ are independent draws. The $Y$ need not be binary here. Could anyone point me to a derivation of the optimal ranking function $h$ that minimizes $\rho(h)$?

1

There are 1 best solutions below

8
On BEST ANSWER

This goes the same way as the proof for classification as well as for pairwise ranking (where $h: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}$):

Note that the risk can be written as:

$$\mathbb{P}(Y-Y' \neq h(X) - h(X')) = \mathbb{E} 1(Z=1)1(h(X)<h(X') + 1(Z=-1)1(h(X)>h(X')$$

where $Z = Y-Y'$.

Now, $\mathbb{E} 1(Z=1)1(h(X)<h(X')) = \mathbb{E}[\mathbb{P}(Y = 1, Y' =0|X, X')1(h(X)<h(X'))]$ by conditioning on $X,X'$ and by definition of $Z$.

Applying independence: $\mathbb{P}(Y = 1, Y' =0|X, X') = \mathbb{P}(Y = 1|X)\mathbb{P}(Y' = 0|X') = \eta(X)(1-\eta(X'))$ where $\eta$ is the Bayes classifier: $\eta(x) = \mathbb{P}(Y=1|X=x)$.

The same work for the second terms gives: $\mathbb E [1(Z=-1)1(h(X)>h(X'))] = \mathbb E[\eta(X')(1-\eta(X))1(h(X)>h(X')]$

Putting it together we see that the best $h$ picks the term that is minimal of $\eta(X')(1-\eta(X))$ and $\eta(X)(1-\eta(X'))$ based on whichever of $h(X)$ or $h(X')$ is larger. This can be done with the Bayes classifier $\eta$.

EDIT: For general $Y$ we follow a similar approach, but of course, $\eta(x)$ is no longer useful for us.

$$\mathbb P(1(Y>Y') \neq 1(h(X)>h(X'))) = \mathbb P (Y>Y', h(X)<h(X')) + \mathbb P (Y<Y', h(X)>h(X'))$$

Looking at just the first term:

$$\mathbb P (Y>Y', h(X)<h(X')) = \mathbb E[ \mathbb P(Y>Y'|X,X')1(h(X)<h(X')))]$$

and now the second term:

$$\mathbb P (Y<Y', h(X)>h(X')) = \mathbb E[ \mathbb P(Y<Y'|X,X')1(h(X)>h(X')))]$$

So pointwise for each $X, X'$, the best $h$ picks out the minimum of $P(Y<Y'|X,X')$ and $P(Y>Y'|X,X')$ given only the relative values of $h(X)$ and $h(X')$. By symmetry, the best we can do (in expectation) is with $h(x) = \mathbb E[P(Y>Y'|X, X') | X=x]$ where we marginalize over $Y', X'$. This is optimal in the same way that conditional expectation is optimal (in the mean square sense).