Background
I was reading SoftTriple loss paper, when I stumble with the following. The authors claim that $p_j$ is the closed form solution of
$$p_j = \frac{exp\{\lambda x_i^T(w_j - w_{y_i})\}}{\Sigma_j exp\{\lambda x_i^T(w_j - w_{y_i})\}}$$
$$ \underset{p \in \triangle}{\max} \lambda \sum_j p_jx_i^T(w_j - w_{y_i}) + H(p) \text{ s.t. } 1^Tp = 1 \text{ ,where } H(p) = -\sum_j p_j log(p_j) $$
I came across something very similar, from Brandon Amos's Thesis, chapter 2. The author mentioned that solving for
$$ \underset{0 < y < 1}{\text{argmin}} -x^Ty - H(y) \text{ s.t. } 1^Ty = 1 \text{ ,where } H(y) = -\sum_i y_i log(y_i) $$ gives a closed form solution of $$ y^\star = \frac{e^{x_i}}{\sum_j e^{x_j}} $$ by using K.K.T condition
Question
However, the author of SoftTriple loss, also claim that $$\lambda \sum_j p_jx_i^T(w_j - w_{y_i}) + H(p) \tag{1}\label{eq1}$$
$$= log(\sum_j exp(\lambda x_i^T(w_j - w_{y_i}))) \tag{2}\label{eq2}$$ $$= -log \Big( \frac{ exp\{\lambda w_{y_i}^Tx_i\} }{ \sum_j exp\{\lambda w_j^Tx_i\} } \Big)$$
I am lost at this part because from $p_j$
$$ p_j = \frac{exp\{\lambda x_i^T(w_j - w_{y_i})\}}{\Sigma_j exp\{\lambda x_i^T(w_j - w_{y_i})\}}\\ = \frac{exp \{ \lambda x_i^T w_j\} }{ \sum_j exp\{\lambda w_j^Tx_i\} } $$
which is really close to what I am looking for, but not quite ($w_j \neq w_{y_i}$). So my question is how do you proof equivalent between $\eqref{eq1}$, and $\eqref{eq2}$
Note here The definition of $p_j$ is ambiguous because the $j$ variable duplication, so I interpret as
$$p_j = \frac{exp\{\lambda x_i^T(w_j - w_{y_i})\}}{\Sigma_k exp\{\lambda x_i^T(w_k - w_{y_i})\}}$$