I am trying to understand why there is significance difference in the performance of two proposed solutions.
Original question (Constraint minimization of sum of Non-symmetric matrices)
\begin{equation} \begin{array}{c} \text{min} \hspace{4mm} \big(\lambda_1\left( \mathbf{y}^T V_{(1)}\mathbf{x} \right)^2 + \lambda_2\left( \mathbf{y}^T V_{(2)}\mathbf{x} \right)^2\big) \\ s.t \hspace{10mm}\|\mathbf{x}\|_2 = 1 \\ \hspace{17mm}\|\mathbf{y}\|_2 = 1, \end{array} \end{equation}
where $\mathbf{x} \in \mathbb{R}^{m}$,$\mathbf{y} \in \mathbb{R}^{n}$, $\mathbf{V}_{i} \in \mathbb{R}^{n\times m}$. Also $\lambda_1\geq \lambda_2\geq 0$.
Both solutions are iterative and run same algorithm.The only difference is in the matrices.
Approach 1:
\begin{equation} \begin{array}{c} \big(\lambda_1\left( \mathbf{y}^T V_{(1)}\mathbf{x} \right)^2 + \lambda_2\left( \mathbf{y}^T V_{(2)}\mathbf{x} \right)^2\big) = \lambda_1\left( \mathbf{y}^T V_{(1)}\mathbf{x}\mathbf{x}^T({V}^T_{(1)}) \mathbf{y}\right) + \lambda_2\left( \mathbf{y}^T V_{(2)}\mathbf{x}\mathbf{x}^T({V}^T_{(2)}) \mathbf{y} \right) = \mathbf{y}^T \mathbf{Z}_x \mathbf{y} \end{array} \end{equation}
\begin{equation} \begin{array}{c} \big(\lambda_1\left( \mathbf{y}^T V_{(1)}\mathbf{x} \right)^2 + \lambda_2\left( \mathbf{y}^T V_{(2)}\mathbf{x} \right)^2\big) = \lambda_1\left(\mathbf{x}^T {V}^T_{(1)})^T \mathbf{y}\mathbf{y}^T{V}_{(1)}\mathbf{x}\right) + \lambda_2\left(\mathbf{x}^T {V}^T_{(2)}\mathbf{y}\mathbf{y}^T{V}_{(2)} \mathbf{x}\right)= \mathbf{x}^T \mathbf{Z}_y \mathbf{x} \end{array} \end{equation}
Approach 2:
\begin{equation} \mathbf{B}_y=\begin{bmatrix} \sqrt{\lambda_1}y_k^T V^{(1)} \\ \sqrt{\lambda_2}y_k^T V^{(2)} \end{bmatrix}\\ \mathbf{B}_x=\begin{bmatrix} \sqrt{\lambda_1} V^{(1)} x_{k+1} & \sqrt{\lambda_2} V^{(2)}x_{k+1}\end{bmatrix} \end{equation}
Algorithm: Make an initial guess for $\mathbf{x}_k,\mathbf{y}_k$ for$k=0$ and run for $k=0,1,2,…$ until a predefined minimum value is achieved.
Step 1: Use initial guess for $\mathbf{y}$ to compute $\mathbf{Z}_y$ (approach 1) or $\mathbf{B}_y$ (approach 2). Use SVD and take the singular vector corresponding to least singular value as $\mathbf{x}_+$.
Step 2: Update $\bar{x}=x_k+\alpha(x_+-x_k)$, $x_{k+1}=\|\bar{x}\|^{-1}\bar{x}$
Step 3: $\mathbf{\bar{x}}$ from step 1 to compute $\mathbf{Z}_x$ (approach 1) or $\mathbf{B}_x$ (approach 2). Use SVD and take the singular vector corresponding to least singular value as decide $\mathbf{y}_+$.
Step 4: update $\bar{y}=y_k+\beta(y_+-y_k)$, $y_{k+1}=\|\bar{y}\|^{-1}\bar{y}$
My observation: Approach 2 always works better that approach 1 but I don't understand how $\mathbf{B}_x,\mathbf{B}_y$ are define. Is it some standard approach that I can learn.
I would appreciate if someone can make a contribution here.