N-pair loss derivation of the formula

257 Views Asked by At

I am trying to understand how is contrastive learning loss function derived in a really known paper called Improved Deep Metric Learning with Multi-class N-pair Loss Objective , but I fail to reason how they derived from the formula $$\log(1 + \sum_{i=1}^{N-1} \exp(f(x)^Tf(x_i^{-}) - f(x)^Tf(x^+)))$$

this one $$-\log(\frac{ \exp(f(x)^Tf(x^+))}{\exp(f(x)^Tf(x^{+})) + \sum_{i=1}^{N-1} \exp(f(x)^Tf(x_i^{-}))})$$

where $\{x_i^-\}$ is a set of $N - 1$ negative samples, $x^+$ is positive sample, and x is current sample and $f: \mathbb{R}^q \rightarrow \mathbb{R}^p$. All these samples have the same domain $\mathbb{R}^d$. If you do not understand what negative,positive and current mean, that is okay, because it is irrelevant at this stage.

2

There are 2 best solutions below

5
On BEST ANSWER

from the second to the first it's easy to reconstruct the equality :

$$-\log(\frac{ \exp(f(x)^Tf(x^+))}{\exp(f(x)^Tf(x^{+})) + \sum_{i=1}^{N-1} \exp(f(x)^Tf(x_i^{-}))}) = $$

$$\log(\frac{\exp(f(x)^Tf(x^{+})) + \sum_{i=1}^{N-1} \exp(f(x)^Tf(x_i^{-}))}{ \exp(f(x)^Tf(x^+))}) =$$

$$\log(1 + \sum_{i=1}^{N-1} \frac{\exp(f(x)^Tf(x_i^{-}))}{ \exp(f(x)^Tf(x^+))}) =$$

$$\log(1 + \sum_{i=1}^{N-1} \exp(f(x)^Tf(x_i^{-})-f(x)^Tf(x^+))) $$

0
On

Just posting a slightly cleaner version of the accepted answer:

$$a = f(x)^Tf(x^+)$$ $$b = f(x)^Tf(x_i^{-})$$

$$-\log\left(\frac{e^a}{e^a + \sum_{i=1}^{N-1} e^b}\right) = $$

$$\log\left(\frac{e^a + \sum_{i=1}^{N-1} e^b}{e^a}\right) = $$

$$\log\left(\frac{e^a}{e^a} + \sum_{i=1}^{N-1} \frac{e^b}{e^a}\right) = $$

$$\log\left(1 + \sum_{i=1}^{N-1} \frac{e^b}{e^a}\right) = $$

$$\log\left(1 + \sum_{i=1}^{N-1} e^{b-a}\right) $$

$$\log\left(1 + \sum_{i=1}^{N-1} e^{f(x)^Tf(x_i^{-}) - f(x)^Tf(x^+)}\right) $$