Gradients for partially symmetric CP decomposition of 3rd order tensor

84 Views Asked by At

I am interested in computing a rank-$R$ CP decomposition of a 3rd order tensor that is partially symmetric about the first 2 modes. The factorization of a vanilla CP decomposition is given below as a least squares minimization problem:

$$ \min_{A,B,C} || \mathcal{X} - [[A,B,C]]||^{2} $$

where $\mathcal{X} \in \mathbb{R}^{L \times M \times N} $ is the tensor to approximate, $A \in \mathbb{R}^{L \times R}$, $B \in \mathbb{R}^{M \times R}$, and $C \in \mathbb{R}^{N \times R}$ are factor matrices to estimate. [[]] denotes the sum of outer products of each of the corresponding factor matrix columns:

$$ [[A,B,C]] = \sum_{r=1}^{R} a_{r} \otimes b_{r} \otimes c_{r} $$

Where $a_r$ is the $r$-th column of $A$, etc.

This objective can be rewritten as sub-problems corresponding to each mode:

$$ \min_{A} ||\mathcal{X}_{(1)} - A(C \odot B)^{T}||^{2} \\ \min_{B} ||\mathcal{X}_{(2)} - B(C \odot A)^{T}||^{2} \\ \min_{C} ||\mathcal{X}_{(3)} - C(B \odot A)^{T}||^{2} $$

where the subscript $\mathcal{X}_{(i)}$ indicates the $i$-th mode unfolding of a tensor (matricization where the mode-$i$ fibers become the columns of the resulting matrix) and $\odot$ is the Khatri-Rao product. The partial derivatives, with respect to each factor matrix, are as follows:

$$ \frac{\partial f}{\partial A} = (\mathcal{T}_{(1)} - \mathcal{X}_{(1)})(C \odot B) \\ \frac{\partial f}{\partial B} = (\mathcal{T}_{(2)} - \mathcal{X}_{(2)})(C \odot A) \\ \frac{\partial f}{\partial C} = (\mathcal{T}_{(3)} - \mathcal{X}_{(3)})(B \odot A) $$

where $\mathcal{T}=[[A,B,C]]$. See here and here for details regarding all of the above derivation/implementation.

In the case of partial symmetry about the first two modes, my problem becomes:

$$ \min_{A,A,C} || \mathcal{X} - [[A,A,C]]||^{2} $$

where $\mathcal{X} \in \mathbb{R}^{M \times M \times N} $ is a the tensor to approximate, $A \in \mathbb{R}^{M \times R}$, and $C \in \mathbb{R}^{N \times R}$.

So, my question is, what would the partial derivatives be for the above case with partial symmetry with respect to $A$ and $C$? My actual optimization problem is more complex than this, so I want the gradients for non-linear optimization and this is the particular aspect of the objective that I have been unable to compute them. Any advise is greatly appreciated, I've been banging my head trying to figure this out - thank you!

2

There are 2 best solutions below

2
On BEST ANSWER

$ \def\d{\delta} \def\o{{\tt1}} \def\E{{\cal E}} \def\T{{\cal T}} \def\H{{\cal H}} \def\X{{\cal X}} \def\Y{{\cal M}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vecc#1{\op{vec}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $First, define two extremely useful zero-one (aka indicator) tensors $$\eqalign{ \H_{ijk} &= \begin{cases} \o\quad{\rm if}\;\;i=j=k \\ 0\quad{\rm otherwise} \\ \end{cases} \\ \E_{ijkl} &= \begin{cases} \o\quad{\rm if}\;\;i=k\;\;{\rm and}\;\;j=l \\ 0\quad{\rm otherwise} \\ \end{cases} \\ }$$ The latter is often written in terms of Kronecker delta symbols as $$ \E_{ijkl} = \d_{ik}\,\d_{jl} $$ Now we can write matrix self-gradients and $\T$ using the Einstein summation convention $$\eqalign{ &\grad{A_{ij}}{A_{kl}} \;=\; \E_{ijkl} = \d_{ik}\,\d_{jl} \\ &\T = [[A,B,C]]\qiq\T_{ijk} = \H_{pqr}\:A_{ip}\,B_{jq}\,C_{kr} \\ }$$ For typing convenience, define the tensor $$ \Y=\T-\X \qiq d\Y=d\T $$ Write the objective function as a triple contraction product and calculate its gradient wrt $A$ $$\eqalign{ f &= \Y_{ijk} \:\: \Y_{ijk} \\ df &= 2\,\Y_{ijk} \:\: d\T_{ijk} \\ &= 2\,\Y_{ijk} \LR{\H_{pqr}\:\c{dA_{ip}}\,B_{jq}\,C_{kr}} \\ \grad{f}{A_{uv}} &= 2\,\Y_{ijk}\H_{pqr}\,\c{\d_{iu}\d_{pv}}\,B_{jq}\,C_{kr} \\ &= 2\,\Y_{ujk}\H_{vqr}\,B_{jq}\,C_{kr} \\ }$$ Suspending the Einstein convention, you would code this as $$\eqalign{ \grad{f}{A_{uv}} &= 2 \sum_{j=\o}^M\sum_{k=\o}^N \Y_{ujk}\,B_{jv}\,C_{kv} \\ }$$ Change the definition of $\T$ as proposed in your post and re-calculate the gradient $$\eqalign{ \T &= [[A,A,C]]\qiq\T_{ijk} = \H_{pqr}\:A_{ip}\,A_{jq}\,C_{kr} \\ df &= 2\,\Y_{ijk} \:\: d\T_{ijk} \\ &= 2\,\Y_{ijk} \LR{\H_{pqr}\:\c{dA_{ip}}\,A_{jq}\,C_{kr}} + 2\,\Y_{ijk} \LR{\H_{pqr}\:A_{ip}\:\c{dA_{jq}}\,C_{kr}} \\ &= 2\,\Y_{ijk}\H_{pqr}C_{kr} \LR{\:\c{dA_{ip}}\,A_{jq} + A_{ip}\:\c{dA_{jq}}} \\ \grad{f}{A_{uv}} &= 2\,\Y_{ijk}\H_{pqr}C_{kr}\LR{\:\c{\d_{iu}\d_{pv}}A_{jq}+A_{ip}\c{\d_{ju}\d_{qv}}} \\ &= 2\,\Y_{ujk}\H_{vqr}C_{kr}A_{jq} \;+\; 2\,\Y_{iuk}\H_{pvr}C_{kr}A_{ip} \\ }$$ And this would be coded as $$\eqalign{ \grad{f}{A_{uv}} &= 2 \sum_{j=\o}^L\sum_{k=\o}^N \LR{\Y_{ujk}+\Y_{juk}}A_{jv}C_{kv} \\ }$$

0
On

The most straightforward way is to use the gradients you have already calculated in the general case. Doing so, you have $$ \frac{\partial \phi}{\partial \mathbf{A}} = \left(\mathcal{T}_{(1)}-\mathcal{X}_{(1)}\right) (\mathbf{C}\boxtimes \mathbf{A}) + \left(\mathcal{T}_{(2)}-\mathcal{X}_{(2)}\right) (\mathbf{C}\boxtimes \mathbf{A}) $$ where $\boxtimes$ denotes the Khatri-Rao product.

We can check that this result agrees with Greg's answer $$ \frac{\partial \phi}{\partial A_{uv}} = \sum_{jk} M_{ujk} (A_{jv}C_{kv})+ \sum_{jk} M_{juk} (A_{jv}C_{kv}) $$