Can someone help me find the first derivative of the following function $\mathcal{L}$ with respect to the elements $\phi_{mn}$ of the matrix $\mathbf{\Phi}$?
\begin{equation} \mathcal{L} = -\frac{1}{2}\big[\log\left|\mathbf{C}\right| - 2\left[ \mathbf{t}\log\left(\sigma\big(\mathbf{\Phi}\mathbf{w}\big)\right) + (1-\mathbf{t})\log\left(1-\sigma\big(\mathbf{\Phi}\mathbf{w}\big)\right) \right] + \mathbf{w}^{T}\mathbf{A}\mathbf{w}\big] \end{equation}
where \begin{equation} \sigma(x) = \frac{1}{1-e^{-x}}, \\ \mathbf{C} = \mathbf{B^{-1}} + \mathbf{\Phi} \mathbf{A}^{-1} \mathbf{\Phi}^{T}, \\ \mathbf{w} = \mathbf{B^{-1}}\mathbf{S}\mathbf{\Phi}^{T}\mathbf{t}, \\ \end{equation}
Note that bold small letters denote $N\times 1$ vectors, bold capital letters $N\times N$ matrices, $\mathbf{A}$, $\mathbf{B}$ are diagonal, and the sigmoid function $\sigma(x)$ is element-wise for vector input.
I earlier asked for a somewhat similar derivative where the $\log\left|\mathbf{C}\right|$ term appeared as well. The excellent and detailed answer to it (which might be helpful for this problem) can be found here.
Just like the previous linked question, replace $\,{\rm log}({\rm det}({\bf C}))$ with $\,{\rm tr}({\rm log}({\bf C}))$.
Then you need to know the trick for the finding derivative of scalar function applied element-wise to a matrix argument. Assume that you have a scalar function $f(x)$ whose derivative is known to be $f'(x)$. When you apply this element-wise to a matrix, the differential is $$\eqalign{ df({\bf X}) &= f'({\bf X})\circ d{\bf X} \cr }$$ where $\circ$ denotes the Hadamard product.
For the logistic function, the derivative is known to be: $\,\,\,\sigma' = \sigma - \sigma^2$.
Actually, you are you are taking logarithms of the logistic function, so the derivatives are $$\eqalign{ \frac{d\,{\rm log}(\sigma(x))}{dx} &= \frac{\sigma-\sigma^2}{\sigma} = 1-\sigma \cr \frac{d\,{\rm log}(1-\sigma(x))}{dx} &= \frac{-(\sigma-\sigma^2)}{1-\sigma} = -\sigma \cr }$$
Now we can put all of these pieces together. $$\eqalign{ -2\,L &= {\rm tr}({\rm log}({\bf C})) + {\bf A}:{\bf w}{\bf w}^T -2\,{\bf t}:{\rm log}(\sigma) -2\,(1-{\bf t}):{\rm log}(1-\sigma) \cr -2\,dL &= {\bf C}^{-1}:d{\bf C} + {\bf A}:2\,{\rm sym}(d{\bf w}\,{\bf w}^T)-2\,{\bf t}:(1-\sigma)\circ d{\bf x} -2\,(1-{\bf t}):(-\sigma)\circ d{\bf x} \cr &= {\bf C}^{-1}:d{\bf C} + 2\,{\bf A}:d{\bf w}\,{\bf w}^T-2\,{\bf t}\circ(1-\sigma):d{\bf x} +2\,(1-{\bf t})\circ\sigma:d{\bf x} \cr &= {\bf C}^{-1}:d{\bf C} + 2\,{\bf A}{\bf w}:d{\bf w} +2\,(\sigma-{\bf t}):d{\bf x} \cr }$$ Now we just need to find expressions for {$d{\bf C},d{\bf w},d{\bf x}$} in terms of $d{\bf\Phi}$. $$\eqalign{ {\bf w} &= {\bf B^{-1}S\Phi^T\,t} \cr d{\bf w} &= {\bf B^{-1}S}\,d{\bf \Phi^T\,t} \cr\cr {\bf x} &= {\bf \Phi w} \cr d{\bf x} &= d{\bf \Phi}\,{\bf w} + {\bf \Phi}d{\bf w} \cr &= d{\bf \Phi}\,{\bf w} + {\bf \Phi B^{-1}S}\,d{\bf \Phi^T\,t} \cr\cr {\bf C} &= {\bf B^{-1}+\Phi A^{-1} \Phi^T} \cr d{\bf C} &= 2\,{\rm sym}(d{\bf \Phi A^{-1} \Phi^T}) \cr\cr }$$ Substituting $$\eqalign{ -2\,dL &= {\bf C}^{-1}:2\,{\rm sym}(d{\bf \Phi A^{-1} \Phi^T}) + 2\,{\bf Aw}:{\bf B^{-1}S}\,d{\bf \Phi^T\,t} \cr &+2\,(\sigma-{\bf t}):(d{\bf \Phi}\,{\bf w} + {\bf \Phi B^{-1}S}\,d{\bf \Phi^T\,t}) \cr\cr &= 2\,{\bf C}^{-1}{\bf \Phi A^{-1}}:d{\bf \Phi} + 2\,{\bf (S^TB^{-1}Awt^T)^T}:d{\bf \Phi} \cr &+2\,{\bf (\sigma-t)w^T}:d{\bf \Phi} +2\,{\bf (S^TB^{-1}\Phi^T(\sigma-t)t^T)^T}:d{\bf \Phi}\cr\cr dL &= \big[{\bf (t-\sigma)w^T + t(t-\sigma)^T\Phi B^{-1}S -tw^TAB^{-1}S - C^{-1}\Phi A^{-1}}\big]:d{\bf \Phi}\cr\cr }$$ Finally, the expression for the gradient is $$\eqalign{ \frac{\partial L}{\partial {\bf \Phi}} &= {\bf (t-\sigma)w^T + t(t-\sigma)^T\Phi B^{-1}S -tw^TAB^{-1}S - C^{-1}\Phi A^{-1}} \cr }$$