Can we obtain min $\mathbf{x}$ in $\Vert A - B(I\otimes \mathbf{x})\Vert_F^2$ algebraically?

183 Views Asked by At

Are we able to obtain the following algebraically?$$\widehat{\mathbf{x}}=\underset{\mathbf{x}}{\operatorname{argmin}}\|A-B(I\otimes \mathbf{x})\|_F^2$$where $A\in \mathbb{R}^{m\times n}$, $B\in \mathbb{R}^{m\times (m\cdot n)}$, $I\in \mathbb{R}^{n\times n}$ is the identity, $\mathbf{x}\in \mathbb{R}^{m\times 1}$, and $\otimes$ denotes the Kronecker product. If it helps, $m<n$.

Let $f_{(\mathbf{x})}=\|A-B(I\otimes \mathbf{x})\|_F^2$ and $\operatorname{tr}(\cdot )$ be the trace function, then\begin{align*}f_{(\mathbf{x})} & =\operatorname{tr}\left (A^tA\right )+\operatorname{tr}\left ((I\otimes \mathbf{x})^tB^tB(I\otimes \mathbf{x})\right )-2\operatorname{tr}\left (A^tB(I\otimes \mathbf{x})\right ) \\ & =\operatorname{tr}\left ((I\otimes \mathbf{x})(I\otimes \mathbf{x})^tB^tB\right ) -2\operatorname{tr}\left ((I\otimes \mathbf{x})A^tB\right )+\operatorname{constant}. \end{align*}I'm stuck at proceeding beyond this step to find the derivative of the function with respect to $\mathbf{x}$. I have referred to tools like the Matrix Cookbook but couldn't find the derivative structure.

Please help with this problem if it can be solved algebraically (not numerically).
Thanks in advance.

3

There are 3 best solutions below

2
On

I would do the following, define $W=I\otimes x$ and $h(W)=||A-BW||_{F}^{2}$. Now doing $f(V)= ||V||_{F}^{2}$ and $g(W) = A-BW$ then $h(W)=f \circ g(W)$. Now calculating the Jacobian for $f$ and $g$ we would have: $$ J_{W}(g) = - B\hbox{ and } J_{V}(f)=2V^{\top}.$$ And now calculate the Jacobian for $h(w)$ using the chain rule: $$J_{W}(f\circ g)=J_{g(W)}(f)J_{W}(g)=-2(A-BW)^{\top}B = 2(BW - A)^{\top} B.$$ Since the function $h(W)$ is convex, surely the minimun is found when $J_{W}(f\circ g)=0$. Then I would solve for $W$ like this: $$2(BW - A)^{\top} B = 0 \Rightarrow B^{\top}(BW - A) = 0^{\top} \Rightarrow B^{\top}BW = B^{\top}A.$$ Assuming $B^{\top}B$ is invertible then: $$W = (B^{\top}B\big)^{-1}B^{\top}A \Rightarrow I\otimes x = (B^{\top}B)^{-1}B^{\top}A.$$ From the last equation we would obtain the components of $ x $.

2
On

Vectorize the matrices which appear in the Frobenius norm, i.e. $$\eqalign{ &{\rm vec}(A) = a \\ &{\rm vec}\Big(B(I_n\otimes x)\Big) &= (I_n\otimes B)\,{\rm vec}(I_n\otimes x) \\ &&= (I_n\otimes B)\Big({\rm vec}(I_n)\otimes I_m\Big)x \\ &&= Mx \\ }$$ Write the norm in terms of these vectors, then calculate its gradient. $$\eqalign{ \lambda &= (Mx-a)^T(Mx-a) \\ \frac{\partial \lambda}{\partial x} &= 2M^T(Mx-a) \\ }$$ Set the gradient to zero and solve for the optimal $x$ $$\eqalign{ M^TMx &= M^Ta \\ x &= (M^TM)^{-1}M^Ta \\ &= M^+a \qquad\big({\rm pseudoinverse}\big) \\ }$$

3
On

The systematic approach to use here is to define the operations with Kronecker products ($\otimes$) working on vectorizations ($\text{vec}$).

First let us split the problem into two:

  1. Express $({\bf R}\otimes {\bf x})$ as a matrix operator multiplying on $\bf x$ : $({\bf R}\otimes {\bf x})=\bf M_{R\otimes} x$.

  2. Express the subsequent multiplication by $\bf B$ as a matrix operator : $\bf M_{L(B)}$. Notation meaning : (M)ultiply from the (L)eft by $\bf B$.

With these things together we can reformulate :

$${\bf \hat x} = \underset{\mathbf{x}}{\operatorname{argmin}}\|{\bf A}-{\bf M_{L(B)}} {\bf M_{I\otimes}} {\bf x}\|_F^2$$

Realizing we can consider $\bf M = M_{L(B)} M_{I\otimes}$ a new matrix now we have an ordinary least squares problem:

$${\bf \hat x} = \underset{\mathbf{x}}{\operatorname{argmin}}\|{\bf A}-{\bf M} {\bf x}\|_F^2$$

The link to Kronecker products shall help you solve 2 so I will focus on solving 1 in the remainder of this answer.

I will not make a proof, but make a numerical demonstration with some Octave code

N = [5,4];
B = rand(2,prod(N));
R = rand(N(1));
x = rand(N(2),1);

M_2 = kron(R(:),eye(N(2)));
M_1 = kron(eye(5),B);

err = norm(M_1*M_2*x - vec(B*kron(R,x)))

Gives me $err \approx 1.2\cdot 10^{-15}$

Which is reasonable given that we use random numbers in [0,1] and double precision calculations.

Now we have transformed the problem to a usual linear least squares but maybe using a slightly bigger matrix than we might be used to.