Is N-Pair loss generating a meaningful embedding space?

46 Views Asked by At

In the paper "Improved Deep Metric Learning with Multi-class N-pair Loss Objective" [1] the author proposes the N-Pair loss as an efficient and effective alternative to Triplet loss [2]. The author also states that this loss

pushes (N-1) negative examples all at once, based on their similarity to the input example.

Let $x$ be an input sample, $x^+$ a sample similar to $x$, and a set N of (N-1) samples dissimilar to $x$, the N-Pair loss is defined as:

\begin{equation}\label{n_pairs_loss} \mathcal{L}(\{x,x^+,x_i\}; f) = -log \frac{exp(f^T f^+)}{exp(f^T f^+) + \sum_{i=1}^{N-1}exp(f^T f_i)} \end{equation}

where $f(·;θ)$ is an embedding kernel defined by a deep neural network, $f$ is the representation of $x$, $f_i$ is the representation of the negative samples, and $f^+$ is the representation of the positive sample. This loss is minimized when the representation of $x$ is closer to $x^+$ than to all $x^-$.

However, what terms in the equation guarantee that the deep neural network embeds the samples into a representation with meaningful properties? That is if we have two dissimilar samples $x_1^-$ and $x_2^-$, and where $similarity(x,x_1^-)>similarity(x,x_2^-)$ then $x_2^-$ should be farther apart than $x_1^-$.

How to demonstrate mathematically that the N-Pair loss is building a "semantic" embedding space?

[1] Sohn, Kihyuk. "Improved deep metric learning with multi-class n-pair loss objective." Advances in neural information processing systems 29 (2016): 1857-1865.

[2] Weinberger, Kilian Q., John Blitzer, and Lawrence K. Saul. "Distance metric learning for large margin nearest neighbor classification." Advances in neural information processing systems. 2006.