How to derive this gradient descending solution?

80 Views Asked by At

I read this paper "A Matrix Factorization Technique with Trust Propagation for Recommendation in Social Networks", which gives a gradient descending solution to the recommender system rating prediction in a probabilistic matrix factorization way, but problem is I can't figure out how to derive the partial function of factorized vector of user, like this below, because of the strange appearance of $T_{vu}$, $T_{vw}$ and $U_w$ in the given result.

objective function: $L\left( R, T, U, V \right) = \frac{1}{2} \sum\nolimits_{u = 1}^N \sum\nolimits_{i = 1}^M I_{ui}^R ( R_{ui} - g(U^{T}_{u}V_{i} ))^2 + \frac{\lambda_U}{2} \sum\nolimits_{u = 1}^N \left\| U_u \right\|^2 + \frac{\lambda_V}{2} \sum\nolimits_{i = 1}^M \left\| V_i \right\|^2 + \frac{\lambda_T}{2} \sum\nolimits_{u = 1}^N \left\| U_u - \sum\nolimits_{v \in N_u} T_{u, v} U_v \right\|^2$

given result:

$\frac{\partial L}{\partial U_u} = \sum\nolimits_{i = 1}^M I_{ui}^R V_i [g( U_u^T V_i ) - R_{ui} ]g'( U_u^T V_i ) + \lambda_U U_u + \left[ \lambda_T \left( U_u - \sum\nolimits_{v \in N_u} T_{uv} U_v \right) - \lambda_T \sum\nolimits_\left\{ v|u \in N_v \right\} T_{vu} \left( U_v - \sum\nolimits_{w \in N_v} T_{vw} U_w \right) \right]$

1

There are 1 best solutions below

0
On

Expressing the norms in terms of Frobenius products $\|M\|_F^2 = M:M$, the objective function can be written as $$ \eqalign { L &= \frac {1} {2}\,\bigg(P\circ(R-g):P\circ(R-g) + \lambda_U U:U + \lambda_V V:V + \lambda_T (U-TU):(U-TU)\bigg) \cr } $$ where $A\circ B$ is the Hadamard product. Also $g=g(U^TV)$ and $P=I^R$, since they're quicker to type.

Taking the differential is merely the repeated application of a few simple rules $$ \eqalign { d\,(M:M) &= 2\,M:dM \cr A:B\circ C &= A\circ B:C \cr X:Y^TZ &= YX:Z \cr dg(X) &= g'\circ dX \cr } $$

This produces $$ \eqalign { dL &= P\circ(R-g):P\circ(-dg) + \lambda_UU:dU + \lambda_T(U-TU):(dU-TdU) \cr &= -P\circ(R-g):dg + \lambda_UU:dU + \lambda_T(I-T^T)(U-TU):dU \cr &= -P\circ(R-g):g'\circ(dU^TV) + [\lambda_UU + \lambda_TU- \lambda_TTU - \lambda_TT^TU + \lambda_TT^TTU]:dU \cr &= -[P\circ(R-g)\circ g']\,V^T:dU^T + [\lambda_UU + \lambda_TU- \lambda_TTU - \lambda_TT^TU + \lambda_TT^TTU]:dU \cr &= -V\,[P\circ(R-g)\circ g']^T:dU + [\lambda_UU + \lambda_TU- \lambda_TTU - \lambda_TT^TU + \lambda_TT^TTU]:dU \cr } $$ From the last line, the derivative is $$ \eqalign { \frac {\partial L} {\partial U} &= V\,(P\circ(g-R)\circ g')^T + (\lambda_U+\lambda_T)U - \lambda_T(T+T^T)U + \lambda_TT^TTU \cr } $$ which can be rearranged to the given result.