Constraint minimization of sum of Non-symmetric matrices

311 Views Asked by At

I am trying to find closed form solution to following problem

\begin{equation} \begin{array}{c} \text{min} \hspace{4mm} \big(\lambda_1\left( \mathbf{y}^T V^{(1)}\mathbf{x} \right)^2 + \lambda_2\left( \mathbf{y}^T V^{(2)}\mathbf{x} \right)^2\big) \\ s.t \hspace{10mm}\|\mathbf{x}\|_2 = 1 \\ \hspace{17mm}\|\mathbf{y}\|_2 = 1, \end{array} \end{equation}

where $\mathbf{x} \in \mathbb{R}^{m}$, $\mathbf{y} \in \mathbb{R}^{n}$, and $\mathbf{V^{(i)}} \in \mathbb{R}^{n\times m}$. Also $\lambda_1\geq \lambda_2\geq 0$.

I know that left- and right-singular vectors corresponding to least singular value $\sigma_m^{\mathbf{i}}$ of $\mathbf{V^{i}}$ will minimize the individual terms in the above objective function.

I have two questions here 1) How can I prove that this is a convex optimization problem (I don't have a clue about this). 2)Should I use singular-vector pair that minimize one of the terms in above objective function as a starting point for finding minimizer to the objective through some iterative algorithm like steepest descent or is there another way of getting to closed form solution.

Any constructive suggestion/critique is more than welcomed. Thank you

2

There are 2 best solutions below

8
On BEST ANSWER

I would suggest an alternating minimization approach.

  1. Define two matrices: $$W_x \triangleq \begin{bmatrix} \sqrt{\lambda_1}V^{(1)} \\ \sqrt{\lambda_2}V^{(2)} \end{bmatrix}, \quad W_y \triangleq \begin{bmatrix} \sqrt{\lambda_1}V^{(1)} & \sqrt{\lambda_2}V^{(2)} \end{bmatrix}$$
  2. Let $x_0$ be a right singular vector associated with the minimum singular value of $W_x$, and $y_0$ be the left singular vector associated with the minimum singular value of $W_y$. My intuition suggests that these will be good initial guesses for $x$ and $y$. Let me show you why for $x$; the argument for $y$ is similar:$$\begin{aligned} &\inf_{x,y:\|x\|=\|y\|=1} \lambda_1(y^TV^{(1)}x)^2+\lambda_2(y^TV^{(2)}x)^2 \\ &\quad \geq \inf_{x,y_1,y_2:\|x\|=\|y_1\|=\|y_2\|=1} \lambda_1(y_1^TV^{(1)}x)^2+\lambda_2(y_2^TV^{(2)}x)^2 \\ &\quad = \inf_{x:\|x\|_2=1} \lambda_1\|V^{(1)}x\|^2+\lambda_2\|V^{(2)}x\|^2 = \inf_{x:\|x\|_2=1} \|W_x x\|^2 \end{aligned}$$
  3. For $k=0,1,2,\dots$ until convergence:

  4. Fix $y$ and minimize over $x$ to find $x_+$: $$x_+ = \mathop{\textrm{argmin}}_{\|x\|=1} \left\| \begin{bmatrix} \sqrt{\lambda_1}y_k^T V^{(1)} x \\ \sqrt{\lambda_2}y_k^T V^{(2)} x\end{bmatrix} \right\|$$ $x_+$ is just the right singular vector associated with the minimum singular value of that $2\times m$ matrix.

  5. Update $\bar{x}=x_k+\alpha(x_+-x_k)$, $x_{k+1}=\|\bar{x}\|^{-1}\bar{x}$.
  6. Fix $x$ and minimize over $y$ to find $y_+$: $$y_+ = \mathop{\textrm{argmin}}_{\|y\|=1} \left\| \begin{bmatrix} \sqrt{\lambda_1} y^T V^{(1)} x_{k+1} & \sqrt{\lambda_2} y^T V^{(2)}x_{k+1} \end{bmatrix} \right\|$$ $y_+$ is just the left singular vector associated with the minimum singular value of the $n\times 2$ matrix.
  7. Update $\bar{y}=y_k+\beta(y_+-y_k)$, $y_{k+1}=\|\bar{y}\|^{-1}\bar{y}$.

$\alpha$ and $\beta$ are step sizes you can toy with. Don't be afraid to try $\alpha=\beta=1$ right off the bat, but I am guessing you are going to need to be conservative, or maybe even do line searches. You might also consider multiple steps of $x$, then multiple steps of $y$.

There are no guarantees here. This might work, it might be lousy. But your problem is non-convex, and you cannot expect guarantees.

0
On

This is a comment and is posted in Answer section because of space limitation.

@ Michael…I want to use your answer because it gives the best performance but I need to understand how you came up with $W_y$ and $W_x$ and the

\begin{equation} \begin{bmatrix} \sqrt{\lambda_1}y_k^T V^{(1)} \\ \sqrt{\lambda_2}y_k^T V^{(2)} \end{bmatrix}, \begin{bmatrix} \sqrt{\lambda_1} V^{(1)} x_{k+1} & \sqrt{\lambda_2} V^{(2)}x_{k+1}\end{bmatrix} \end{equation}

I did some work and came up with my own solution hoping that it will match to your solution but it didn't. Can you please take a look and comment.

I recognized \begin{equation} \mathbf{y}^T\mathbf{V}^{(1)}\mathbf{x}=\langle\mathbf{y},\mathbf{V}^{(1)}\mathbf{x} \rangle=\langle\mathbf{x},(\mathbf{V^{(1)}})^{T}\mathbf{y} \rangle=\mathbf{x}^T(\mathbf{V^{(1)}})^{T}\mathbf{y} \tag {*} \end{equation}

Using (*)

\begin{equation} \begin{array}{c} \big(\lambda_1\left( \mathbf{y}^T V^{(1)}\mathbf{x} \right)^2 + \lambda_2\left( \mathbf{y}^T V^{(2)}\mathbf{x} \right)^2\big) = \lambda_1\left( \mathbf{y}^T V^{(1)}\mathbf{x}\mathbf{x}^T({V}^{(1)})^T \mathbf{y}\right) + \lambda_2\left( \mathbf{y}^T V^{(2)}\mathbf{x}\mathbf{x}^T({V}^{(2)})^T \mathbf{y} \right) = \mathbf{y}^T \mathbf{Z}_x \mathbf{y} \end{array} \end{equation}

Similarly

\begin{equation} \begin{array}{c} \big(\lambda_1\left( \mathbf{y}^T V^{(1)}\mathbf{x} \right)^2 + \lambda_2\left( \mathbf{y}^T V^{(2)}\mathbf{x} \right)^2\big) = \lambda_1\left(\mathbf{x}^T ({V}^{(1)})^T \mathbf{y}\mathbf{y}^T{V}^{(1)}\mathbf{x}\right) + \lambda_2\left(\mathbf{x}^T ({V}^{(2)})^T\mathbf{y}\mathbf{y}^T{V}^{(2)} \mathbf{x}\right)= \mathbf{x}^T \mathbf{Z}_y \mathbf{x} \end{array} \end{equation}

where I defined

\begin{equation} \begin{array}{c}\mathbf{Z}_x =\lambda_1 V^{(1)}\mathbf{x}\mathbf{x}^T({V}^{(1)})^T + \lambda_1 V^{(2)}\mathbf{x}\mathbf{x}^T({V}^{(2)})^T \\ \mathbf{Z}_y =\lambda_1({V}^{(1)})^T \mathbf{y}\mathbf{y}^T{V}^{(1)} + \lambda_2 ({V}^{(2)})^T\mathbf{y}\mathbf{y}^T{V}^{(2)} \end{array} \end{equation}

at $k=0$, I initialized $\mathbf{x}=\mathbf{x_0}$ with a random vector s.t $\|{\mathbf{x}}\|_{2}=1$ and computed $\mathbf{Z_{x_0}}$. And took $\mathbf{y'}$ to be the eigen vector corresponding to least eigen value of $\mathbf{Z_{x_0}}$. Then used $\mathbf{y}=\mathbf{y_0}=update(\mathbf{y'})$ to compute $\mathbf{Z_{y_0}}$ and took $\mathbf{x_1}=update(\mathbf{x'})$ as the singular vector corresponding to least eigen value of $\mathbf{Z_{y_0}}$ and iterated till convergence. ($update(.)$ is defined exactly the way Michael did)

But with my approach minimum value is always greater than the one which is obtain using Michael's solution.