I am trying to calculate the derivative of the following function:
$$ \text{Tr}(SB),$$ where $$S_{ij} = \exp \left( -\frac{\|x_i - y_j\|^2}{2\sigma^2} \right)$$
We have $$\begin{align*} &A_{ij}=\|x_i -y_j\|^2 \\ \Rightarrow A &= \text{diag}(X^TX)1^T + 1(\text{diag}(Y^TY))^T - 2X^TY \\ & = (X\odot X)^T11^T + 11^T(Y \odot Y) - 2X^TY \end{align*}$$
then
$$ dA = 2(X\odot dX)^T11^T + 11^T(2(Y\odot dY)) -2(dX^TY+X^TdY)$$
We have
$$\begin{align*} f & = B^T:S \\ df & = B^T:dS \\ &= B^T:(\frac{-1}{2\sigma^2})S\odot dA \\ & =(\frac{-1}{2\sigma^2})B^T:(S\odot (2(X\odot dX)^T11^T + 11^T(2(Y\odot dY)) -2(dX^TY+X^TdY)) \\ \end{align*}$$
Hence, by fixing $X$ we get
$$ df= (\frac{-1}{\sigma^2})B^T:S\odot(11^T(Y\odot dY) - X^TdY) $$
Is my derivation correct?
Could anyone help me to simplify the expression of $\frac{\partial f}{\partial Y}$?
Any help is appreciated.
This a bit too big for a comment, but you've almost got it.
Let's pick things up at the next-to-last step and concentrate on the gradient wrt $Y$. $$\eqalign{ df &= -\frac{1}{2\sigma^2}B^T:(S\odot dA) \\ &= -\frac{1}{2\sigma^2}(S\odot B^T):dA \\ &= -C:dA \\ &= -C:2\Big({\tt11}^T(Y\odot dY)-X^TdY\Big) \\ &= 2C:X^TdY - 2C:\Big({\tt11}^T(Y\odot dY)\Big) \\ &= 2XC:dY - 2Y\odot({\tt11}^TC):dY \\ &= 2\Big(XC - Y\odot({\tt11}^TC)\Big):dY \\ \frac{\partial f}{\partial Y} &= 2\Big(XC - Y\odot({\tt11}^TC)\Big) \\ }$$ where several terms were combined into the matrix $$C=\frac{S\odot B^T}{2\sigma^2}$$ Re-reading your post, the final differential expression for $\big(X=constant\big)$ was correct.
Update
There was a question in the comments about one of the intermediate steps.Let $J={\tt11}^T$ be the all-ones matrix having the same dimensions as $X$, then $$\eqalign{ C:J^T(Y\odot dY) &= JC:(Y\odot dY) \qquad\big({\rm Cyclic\,property\,of\,Trace}\big) \\ &= (Y\odot JC):dY \qquad\big({\rm Hadamard\,commutes\,with\,Frobenius}\big) \\\\ }$$