I want to compute the gradient of the loss function below for one training example $(t,\mathcal{C_t})$. $w_c$ and $w_t$ are vectors in $\mathbb{R}^d$. The $w_c$'s are not taken from the same matrix as $w_t$.
\begin{equation} L(t,\mathcal{C_t}) = \sum_{c \in \mathcal{C}_t } \log \big(1+e^{-w_c \cdot w_t}\big) + \sum_{c \in \mathcal{C}_t^-} \log \big(1+e^{w_c \cdot w_t}\big) \end{equation}
Here is what I have so far. The partial derivative of $\log \big(1+e^{-w_c \cdot w_t}\big)$ w.r.t. $w_c$ (i.e., holding $w_t$ constant) is:
\begin{equation} \frac{1}{1+e^{-w_c \cdot w_t}} \times -w_t e^{-w_c \cdot w_t} = \frac{-w_t}{e^{w_c \cdot w_t} +1} \end{equation}
And, similarly, its partial derivative w.r.t $w_t$ is $\frac{-w_c}{e^{w_c \cdot w_t} +1}$.
We proceed similarly for the second term. In the end, we have:
$\nabla L_{w_c} = \sum_{c \in \mathcal{C}_t } \frac{-w_t}{e^{w_c \cdot w_t} +1} + \sum_{c \in \mathcal{C}_t^-} \frac{w_t}{e^{-w_c \cdot w_t} +1}$
and:
$\nabla L_{w_t} = \sum_{c \in \mathcal{C}_t } \frac{-w_c}{e^{w_c \cdot w_t} +1} + \sum_{c \in \mathcal{C}_t^-} \frac{w_c}{e^{-w_c \cdot w_t} +1}$
Is that correct?
Yes you derivation is fine! I end up with the same expressions.