Solution to a Hadamard product least squares

1.9k Views Asked by At

Given two full-rank matrices $A \in \mathbb{R}^{n \times p}, B \in \mathbb{R}^{n \times k}$ and vectors $y \in \mathbb{R}^n, u \in \mathbb{R}^p, v \in \mathbb{R}^k$ I'd like to solve an optimization problem of the form

$$ \arg\min_{u, v} \|y - (A u) \circ (Bv)\|^2$$

where $\circ$ denotes the Hadamard product (aka elementwise product) and the norm is the euclidean vector norm.

I can find the solution by alternate minimization (iteratively fixing one variable and solving for the other one), but it is terribly slow.In practice I have a lot of these problems where the $y$ varies but $A, B$ are always the same so ideally I would like to express the solution in terms of a matrix factorization of $A$ and/or $B$.

My question is: has this problem been addressed somewhere in the literature? Is it possible to find a closed form solution in terms of a matrix factorization of $A, B$ analogous to the (ordinary) least squares?

2

There are 2 best solutions below

1
On

Here I'd change $\alpha \in \mathbb{R}^p, \ \beta \in \mathbb{R}^k$ just for notation coherence.

One way to transform this problem into an ordinary least squares (OLS) is:

We have our main problem with Hadamard product:

$$ \begin{equation} \text{argmin}_{\alpha,\beta}\|y - \bar{y} \|_2^2, \ \ \ (1) \end{equation} $$

were $\bar{y} = (A\alpha \odot B\beta)$ is our approximation vector.

We can rewrite each row of our approximation vectos as:

$$ \bar{y}_i = \left( A_i \alpha \right)\left(B_i \beta\right), $$

where $A_i$ and $B_i$ denotes the $i^\text{th}$ row of the corfesponding matrix.

Just changing the notation, we have $$ \begin{align} \bar{y}_i & = \left(\begin{array}{l l l} a_{i,1} B_i, \ldots, a_{i,p} B_i \end{array} \right) \left( \begin{array}{} \alpha_1 \beta \\ \vdots \\ \alpha_p \beta \end{array} \right), \end{align} $$

$$ (A\alpha \odot B\beta)_i = (A_i \otimes B_i)(\alpha \otimes \beta), $$

where $\otimes$ denotes the kronecker product.

Using this equality we can define our new design matrix as $ H_i = (A_i \otimes B_i )$, and solution vector as $x = (\alpha \otimes \beta)$, then we can solve the OLS:

$$ \text{argmin}_{x}\|y - Hx\|_2^2, $$

finding our global solution, due the convexity in $x$.

In order to retrieve the solution vectors, can rewrite $x$ as a matix, for instance, product of our solution vectors $\alpha$ and $\beta$,

$$ X = \left( \begin{array}{lcr} \alpha_1 \beta_1 & \ldots & \alpha_1 \beta_k \\ \alpha_2 \beta_1 & \ldots & \alpha_2 \beta_k\\ \vdots & \ddots & \vdots \\ \alpha_p \beta_1 & \ldots & \alpha_p \beta_k \end{array} \right) = \alpha \beta^T $$

We use the singular value decomposotion (SVD) to represent our matrix, as

$$ X=U\Sigma V^T, $$

as $Rank(X)=1$, then using the first left and right singular vectors $u$ and $v$, and the first singular value $\sigma$,

$$ \bar{\alpha} = \gamma\sqrt\sigma u, \ \ \text{and} \ \ \bar{\beta} = \frac{1}{\gamma}\sqrt\sigma v, $$

where $\gamma$ depends of the scale of $\alpha$ and $\beta$, and also the bias, $|\gamma|>0$.

The vectors solutions are scale sensitive, so the solution is not unique and additional assumptions should be made.

Hope this will be usefull

0
On

$ \def\p{\partial} \def\LR#1{\left(#1\right)} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $For typing convenience, define the variables $$\eqalign{ &e = Au,\qquad &de = A\,du,\qquad &E = \Diag e = E^T \\ &f = Bv,\qquad &df = B\,dv,\qquad &F = \Diag f = F^T \\ }$$ $$\eqalign{ &w = \LR{e\circ f-y} \;=\; \LR{Ef-y} \;=\; \LR{Fe-y} \\ }$$ The Frobenius product is a convenient way to write the trace and the norm $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A \\&= \LR{A^TC}:B \\\\ }$$


Use the above notation to write the objective function and calculate its gradients. $$\eqalign{ \phi &= \frac 12\;{w:w} \\ d\phi &= {w}:{dw} \\ &= {w}:\LR{F\,de + E\;df} \\ &= {Fw}:{de} + {Ew}:{df} \\ &= {Fw}:\LR{A\,du} + {Ew}:\LR{B\,dv} \\ &= {A^TFw}:{du} + {B^TEw}:{dv} \\ \grad{\phi}{u} &= {A^TFw},\qquad\grad{\phi}{v} = {B^TEw} \\ }$$ Set the first gradient to zero and calculate the least squares solution $$\eqalign{ 0 &= A^TFw = A^TF\LR{FAu-y} \qiq u_* = \LR{FA}^+y \\ }$$ where $\LR{FA}^+$ denotes the pseudoinverse of $\LR{FA}\,$ however in Matlab code it is more efficient to use the backslash operator $\;u_* = (FA)\,\backslash\,y$

Depending on the relative dimensions of the variables (which weren't fully specified), it may be possible to drastically simplify this calculation.

In particular if $$p \ge n = {\rm rank}(A) = {\rm rank}(F)$$ then $$\LR{FA}^+ = A^+F^+ \qiq u_* = A^+\LR{y\oslash f}$$ Since $A^+$ can be computed once outside of the main loop and since the Hadamard division $(\oslash)$ of vectors is extremely fast, this approach is even more efficient than the previously mentioned backslash operator.

Analogous manipulations of the second gradient yields $\;v_* = \LR{EB}^+y$

The conventional Alternating Least Squares (ALS) iterations are then $$\eqalign{ E &= \Diag{Au} \\ v &= \LR{EB}^+y \\ F &= \Diag{Bv} \\ u &= \LR{FA}^+y \\ }$$ However, simple substitution will eliminate one of the variables and yield a nonlinear equation (NLE) in the remaining variable, e.g. $$\eqalign{ {u} &= \LR{\Diag{B\LR{\Diag{A\c{u}}B}^+y}A}^+y \\ u &= G(\c{u}) \\ F(u) &= {u-G(u)} \;=\; 0 \\ }$$ This NLE can be solved using a quasi-Newton (or conjugate gradient) method, which will have a faster rate of convergence than ALS. The trade-off for this increased speed is the loss of the convergence guarantee of the ALS iteration.

Furthermore, such an approach requires the Jacobian of $F(u)$ which is far too complicated to calculate manually and will require the use of Automatic Differentiation (AD) software.