How to calculate the gradient of $\sum_{i=1}^N \sum_{j=N+1}^{N+M} h_i h_j (\phi_i^T \phi_j)$?

110 Views Asked by At

I'm having troubles with deriving the gradient of the following equation:

$$ F(h;\phi) = \sum_{i=1}^N \sum_{j=N+1}^{N+M} h_i h_j (\phi_i^T \phi_j)$$

with $h_t \in \{0,1\}$ for $t = 1,\dots , N+M, $

and $\phi_k \in R^K$ being a $K$-dimensional latent vector associated with the $k$-th element.

However, I'm having a hard time calculating the gradient of the above equation with respect to $\phi$. I need this gradient to apply first order gradient descent to gain the optimal parameters. Am I allowed to act $\phi$ as two separate latent vector, $\phi_i$ and $\phi_j$, in order to get the gradient similar like this? \begin{align} \frac{dF}{d\phi_i} &= \sum_{i=1}^N \sum_{j=N+1}^{N+M} h_i h_j \phi_j \\ \frac{dF}{d\phi_j} &= \sum_{i=1}^N \sum_{j=N+1}^{N+M} h_i h_j \phi_i^T \end{align} Or I'm not allowed to split them and have to act them as a whole? But then again I'm not sure how to derive this. This is my attempt:

$$ \frac{dF}{d\phi} = 2\sum_{i=1}^N \sum_{j=N+1}^{N+M} h_i h_j \phi_i^T + 2\sum_{i=1}^N \sum_{j=N+1}^{N+M} h_i h_j \phi_j$$

1

There are 1 best solutions below

0
On

You can indeed break down the question into smaller parts: the derivative of the sum is the sum of the derivatives. However! the $i$ is a dummy variable that is being summed over. So you should be more careful, since something like $\frac{\partial f}{\partial \phi_i} = \sum_{i=i}^Na_i$ makes no sense. Instead, you need to introduct a new dummy variable and differentiate that. That is, you should compute $$ \frac{\partial {f_{ij}}}{\partial \phi_k}={?},\quad f_{ij}(\phi) = \phi_i^T \phi_j$$ since the $\phi_i=(\phi_{i,1},\dots, \phi_{i,K})$s are themselves vectors, we have $f_{ij} : \mathbb R^{K(N+M)} \to \mathbb R$, given by $$ f_{ij}(\phi) = \sum_{m=1}^K \phi_{i,m}\phi_{j,m} $$and we can write this gradient wrt. $\phi_k$ as a vector of the $K$ components, each one a 1D partial derivative wrt $\phi_{k,l}$ for $k=1,\dots,N+M$ and $l=1,\dots,K$: $$ \frac{\partial f_{ij}}{\partial \phi_{k,l} } = \frac{\partial }{\partial \phi_{k,l} }\sum_{m=1}^K \phi_{i,m}\phi_{j,m} =\sum_{m=1}^K \frac{\partial \phi_{i,m}}{\partial \phi_{k,l} }\phi_{j,m} + \phi_{i,m}\frac{\partial \phi_{j,m}}{\partial \phi_{k,l} } $$ now, $\frac{\partial \phi_{j,m}}{\partial \phi_{k,l} } $ is zero unless $j=k$ and $m=l$, where it is 1. That is to say, using the Kronecker delta, $$\frac{\partial \phi_{i,m}}{\partial \phi_{k,l} }= \delta_{ik}\delta_{ml},\qquad\frac{\partial \phi_{j,m}}{\partial \phi_{k,l} }= \delta_{jk}\delta_{ml}.$$ substituting this in, we get $$ \frac{\partial f_{ij}}{\partial \phi_{k,l} } = \phi_{k,l}\phi_{j,l} + \phi_{k,l}\phi_{i,l}$$

Going back to $F(h,\phi)$, we see that we have computed the derivative of $F$ with respect to each component of $\phi_k$, for any $\phi_k$: $$ \frac{\partial }{\partial \phi_{k,l} }F(h,\phi)= \sum_{i=1}^N\sum_{j={N+1}}^{N+M} h_i h_j\phi_{k,l}(\phi_{j,l} + \phi_{i,l}). $$

PS: you were one of the last few people to recieve the Tumbleweed badge on Math.SE, which is now retired :)