I am trying to understand how is contrastive learning loss function derived in a really known paper called Improved Deep Metric Learning with Multi-class N-pair Loss Objective , but I fail to reason how they derived from the formula $$\log(1 + \sum_{i=1}^{N-1} \exp(f(x)^Tf(x_i^{-}) - f(x)^Tf(x^+)))$$
this one $$-\log(\frac{ \exp(f(x)^Tf(x^+))}{\exp(f(x)^Tf(x^{+})) + \sum_{i=1}^{N-1} \exp(f(x)^Tf(x_i^{-}))})$$
where $\{x_i^-\}$ is a set of $N - 1$ negative samples, $x^+$ is positive sample, and x is current sample and $f: \mathbb{R}^q \rightarrow \mathbb{R}^p$. All these samples have the same domain $\mathbb{R}^d$. If you do not understand what negative,positive and current mean, that is okay, because it is irrelevant at this stage.
from the second to the first it's easy to reconstruct the equality :
$$-\log(\frac{ \exp(f(x)^Tf(x^+))}{\exp(f(x)^Tf(x^{+})) + \sum_{i=1}^{N-1} \exp(f(x)^Tf(x_i^{-}))}) = $$
$$\log(\frac{\exp(f(x)^Tf(x^{+})) + \sum_{i=1}^{N-1} \exp(f(x)^Tf(x_i^{-}))}{ \exp(f(x)^Tf(x^+))}) =$$
$$\log(1 + \sum_{i=1}^{N-1} \frac{\exp(f(x)^Tf(x_i^{-}))}{ \exp(f(x)^Tf(x^+))}) =$$
$$\log(1 + \sum_{i=1}^{N-1} \exp(f(x)^Tf(x_i^{-})-f(x)^Tf(x^+))) $$