How to prove SoftTriple loss convex optimization closed form equivalence?

39 Views Asked by At

Background

I was reading SoftTriple loss paper, when I stumble with the following. The authors claim that $p_j$ is the closed form solution of

$$p_j = \frac{exp\{\lambda x_i^T(w_j - w_{y_i})\}}{\Sigma_j exp\{\lambda x_i^T(w_j - w_{y_i})\}}$$

$$ \underset{p \in \triangle}{\max} \lambda \sum_j p_jx_i^T(w_j - w_{y_i}) + H(p) \text{ s.t. } 1^Tp = 1 \text{ ,where } H(p) = -\sum_j p_j log(p_j) $$

I came across something very similar, from Brandon Amos's Thesis, chapter 2. The author mentioned that solving for

$$ \underset{0 < y < 1}{\text{argmin}} -x^Ty - H(y) \text{ s.t. } 1^Ty = 1 \text{ ,where } H(y) = -\sum_i y_i log(y_i) $$ gives a closed form solution of $$ y^\star = \frac{e^{x_i}}{\sum_j e^{x_j}} $$ by using K.K.T condition

Question

However, the author of SoftTriple loss, also claim that $$\lambda \sum_j p_jx_i^T(w_j - w_{y_i}) + H(p) \tag{1}\label{eq1}$$

$$= log(\sum_j exp(\lambda x_i^T(w_j - w_{y_i}))) \tag{2}\label{eq2}$$ $$= -log \Big( \frac{ exp\{\lambda w_{y_i}^Tx_i\} }{ \sum_j exp\{\lambda w_j^Tx_i\} } \Big)$$

I am lost at this part because from $p_j$

$$ p_j = \frac{exp\{\lambda x_i^T(w_j - w_{y_i})\}}{\Sigma_j exp\{\lambda x_i^T(w_j - w_{y_i})\}}\\ = \frac{exp \{ \lambda x_i^T w_j\} }{ \sum_j exp\{\lambda w_j^Tx_i\} } $$

which is really close to what I am looking for, but not quite ($w_j \neq w_{y_i}$). So my question is how do you proof equivalent between $\eqref{eq1}$, and $\eqref{eq2}$


Note here The definition of $p_j$ is ambiguous because the $j$ variable duplication, so I interpret as

$$p_j = \frac{exp\{\lambda x_i^T(w_j - w_{y_i})\}}{\Sigma_k exp\{\lambda x_i^T(w_k - w_{y_i})\}}$$