Analytic solution for matrix factorization using alternating least squares

6.1k Views Asked by At

The standard form for ridge regression aims to minimize the following cost function. $$ \min\ \ \sum_i(y_i-x_i^T\beta)^2 + \lambda\sum_j\beta^2_j $$ As described here, it's possible to differentiate that function with respect to $\beta$ and derive the analytical solution as $$ \beta^{ridge}=(X^TX+\lambda I)^{-1}X^Ty $$

Collaborative Filtering for Implicit Feedback Datasets is a popular paper in the field of generating recommendations using matrix factorization. The authors proposed the use of alternating least squares to minimize a slightly different cost function.

$$ \min_{x^*, y^*}\ \ \sum_{u,i}c_{ui}(p_{ui}-x_i^Ty_i)^2+\lambda(\sum_u\|x_u\|^2+\|y_u\|^2) $$

The authors state the analytical solution as $$ x_u = (Y^TC^uY + \lambda I)^{-1}Y^TC^up(u) $$ $$ y_i = (X^TC^iX + \lambda I)^{-1}X^TC^ip(i) $$

where $C^u$ and $C^i$ are diagonal $n*n$ and $m*m$ matrices.

Can somebody please show me how I can derive this? I have tried using the same procedure as the standard ridge regression but I am unable to see how/where the $C^u$ and $C^i$ show up.

If you're interested, here's a python implementation of this algorithm. The code does not directly match the expression in the paper.

1

There are 1 best solutions below

1
On BEST ANSWER

From the paper referenced, the cost function to minimise is $$ \ C=\sum_{u,i}c_{ui}(p_{ui}-x_u^Ty_i)^2+\lambda(\sum_u\|x_u\|^2+\sum_i\|y_i\|^2)$$ Differentiating with respect to vector $x_u$ results in (note: as the vectors $x_u$ and $y_i$ are real vectors, their scalar product is commutative, see red)

$$\begin{align}\frac{\partial C}{\partial x_u} &= -2\sum_ic_{ui}(p_{ui}-\color{red}{x_u^Ty_i})y_i+2\lambda x_u\\&=-2\sum_ic_{ui}(p_{ui}-\color{red}{y_i^Tx_u})y_i+2\lambda x_u\\&=-2Y^TC^up(u)+2Y^TC^uYx_u+2\lambda x_u \text{ Eq.(1)}\end{align}$$ where each row of matrix $Y\in \mathbb{R}^{n\times f}$ is $y_i^T$, the diagonal matrix $C^u\in\mathbb{R^{n\times n}}$ has coefficient $c_{ui}$ in row/column $i$, and vector $p(u)\in\mathbb{R^n}$ contains element $p_{ui}$ in row $i$.

Setting $\frac{\partial C}{\partial x_u}=0$ for optimal $x_u$ and rearranging Eq.($1$) leads to $$(Y^TC^uY+\lambda I)x_u = Y^TC^up(u)\\\Rightarrow x_u=(Y^TC^uY+\lambda I)^{-1}Y^TC^up(u)$$ Differentiating the cost function w.r.t $y_i$ results in:-

$$\begin{align}\frac{\partial C}{\partial y_i}&=-2\sum_uc_{ui}(p_{ui}-x_u^Ty_i)x_u+2\lambda y_i\\&=-2X^TC^ip(i)+2X^TC^iXy_i+2\lambda y_i \text{ Eq.(2)}\end{align}$$
where each row of matrix $X\in \mathbb{R}^{m\times f}$ is $x_u^T$, the diagonal matrix $C^i\in\mathbb{R^{m\times m}}$ has coefficient $c_{ui}$ in row/column $u$, and vector $p(i)\in\mathbb{R^m}$ contains element $p_{ui}$ in row $u$.

Setting $\frac{\partial C}{\partial y_i}=0$ for optimal $y_i$ and rearranging Eq.($2$) leads to $$(X^TC^iX+\lambda I)y_i=X^TC^ip(i)\\\Rightarrow y_i=(X^TC^iX+\lambda I)^{-1}X^TC^ip(i)$$