Deriving derivative of an optimization problem

104 Views Asked by At

Can anyone help me to derive the partial derivative of this optimization problem with respect to $U_u$?

$$ L(R,T,U,V) = \frac{1}{2}\sum_{u=1}^{N}\sum_{i=1}^M I_{u,i}^R(R_{u,i} - g(U_u^TV_i))^2 + \frac{\lambda_U}{2} \sum_{u=1}^{N}U_u^TU_u + \frac{\lambda_V}{2} \sum_{i=1}^{M}V_i^TV_i + \frac{\lambda_T}{2} \sum_{u=1}^{N}(U_u - \sum_{v \in N_u} T_{u,v}U_v)^T(U_u - \sum_{v \in N_u} T_{u,v}U_v)$$

I want to point out that this is not a homework, I am reading this paper, They have derived the partial derivative of this problem with respect to both $U_u$ and $V_i$, I have no Idea how they have done this, It would be really nice someone tell me how should I tackle such problems, how should I approach such functions.

$U_u$ and $V_i$ are $k$ dimensional vectors, $I_{u,i}^R$ is some identity function $\in \{0,1\}$, $N_u$ is set of neighbors of $u$ in a social network.

2

There are 2 best solutions below

0
On BEST ANSWER

From the paper, they use $g(x)$ to denote the logistic function. The most useful form for its derivative is probably $$\eqalign{ \frac{dg}{dx} &= (g-g^2) \implies dg &= (g-g^2)\,dx \cr\cr }$$ Define a few matrix variables $$\eqalign{ X &= U^TV \implies dX = dU^TV + U^TdV \cr G &= g(X) \implies dG = (G-G\odot G)\odot dX \cr M &= I-T \cr J &= I\odot(G-R)\odot(G-G\odot G) \cr }$$ Note that $G$ comes from an element-wise evaluation of the logistic function. When dealing with element-wise functions of matrices, the elementwise/Hadamard product $(\odot)$ is necessary.

Next, the explicit summations can be eliminated in favor of the inner/Frobenius product, which is a convenient infix notation for the trace $$A:B={\rm tr}(A^TB)$$ Putting it all together, we can write the objective function in matrix form, and then find its differential and gradient $$\eqalign{ L &= \frac{1}{2}\Big[I\odot(G-R):(G-R) + \lambda_UU:U + \lambda_VV:V + \lambda_TMU:MU \Big] \cr\cr dL &= I\odot(G-R):dG + \lambda_UU:dU + \lambda_VV:dV + \lambda_TMU:M\,dU \cr &= I\odot(G-R):(G-G\odot G)\odot d(U^TV) + \lambda_UU:dU + \lambda_VV:dV + \lambda_TMU:M\,dU \cr &= J:(dU^TV+U^TdV) + \lambda_UU:dU + \lambda_VV:dV + \lambda_TM^TMU:dU \cr &= (UJ+\lambda_VV):dV + (VJ^T+\lambda_UU+\lambda_TM^TMU):dU \cr\cr \frac{\partial L}{\partial V} &= UJ+\lambda_VV,\,\,\,\,\,\,\,\frac{\partial L}{\partial U} = VJ^T+\lambda_UU+\lambda_TM^TMU \cr\cr }$$ If you're not used to working with the Hadamard and Frobenius products, this post may be hard to follow. So here is a summary of the rules for rearranging terms, which follow from the properties of the trace $$\eqalign{ A:BC &= B^TA:C \cr &= AC^T:B \cr &= A^T:(BC)^T \cr &= BC:A \cr\cr A\odot B &= B\odot A \cr\cr A\odot B:C &= A:B\odot C \cr }$$

1
On

I think that you got the definition of $L(R,T,U,V)$ wrong from the paper (where it is eq.(12)), since in the last term, the products $T_{u,v}U_u$ should be $T_{u,v}U_v$.

Then, one can use the rules of differentiating a function of several variables, including the rule $\frac{\partial}{\partial x}(x^Tx) = 2x$, and the chain rule.

When differentiating $L(R,T,U,V)$, for the first term, note that since you are differentiating with respect to $U_u$, only the inner sum survives. Also, you should use the chain rule here, since $U_u$ appears as the argument to the function $g(U_u^TV_i)$, which is then squared (so chain rule twice).

For the second term, use the rule $\frac{\partial}{\partial x}(x^Tx) = 2x$, and the third term vanishes, since it doesn't depend on $U_u$.

For the last term, you should again use $\frac{\partial}{\partial x}(x^Tx) = 2x$, and the chain rule. Note however that the summations $\sum_{v \in N_u} T_{u,v} U_v$ depend on $u$. This last term is the most difficult one to differentiate.

For details on differentiating vector valued functions, see e.g. Appendix A.4, page 640 in Convex Optimization by Boyd and Vandenberghe. This material is also in Chapter 2, page 8 in the Matrix Cookbook (http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3274/pdf/imm3274.pdf).