Optimization Problem with a skew-symmetric matrix as a variable

253 Views Asked by At

I'm currently trying to solve the following optimization problem

$$\mathop{\text{minimize }}_{X \in \mathbb{R}^{n \times n}} \left\| A X - B \right\|_F^2 \quad \text{ subject to } X = -X^T$$

in which $A,B \in \mathbb{R}^{1 \times n}$ are row vectors. I tried some solvers in MATLAB for constrained linear least-squares problems, but I can never obtain a precise solution, only an approximated one. I also manually experimented with Lagrange multipliers, but still couldn't figure out a possible solution.

Is the problem well posed? Do you guys have any suggestion regarding any possible solutions?

3

There are 3 best solutions below

4
On BEST ANSWER

Vectors will be typeset as rows, and matrices typeset as lists of rows.

Theorem. As we vary $X$ over all skew-symmetric matrices, the smallest possible length for $E= AX-B$ is $||E||= \frac{|A\cdot B}{||A||}$.

Proof. Use Gram-Schmidt to construct an orthonormal basis for space chosen so that your two given vectors $A$ and $B$ take the form $A=(a,0,0\ldots,0)$ and $B=(b_1,b_2,\ldots,0)$. The structure of any skew-symmetric matrix is $ M=((0,s, \ldots);(-s,0,\ldots);(\ldots);\ldots)$ for some unknown scalar $s$. (The suppressed terms in this matrix denoted by $\ldots$ have no effect in the computation that follows.) Note that the row vector $A.M=(0, as, \ldots)$ and thus the error vector $E=A.M-B= (-b_1, as-b_2,\ldots)$. The length of this error vector $E$ is minimized when the suppressed terms $\ldots$ are zero, and when $s=b_2/a$; and with this choice the minimum length is $||E||= |b_1|=\frac{|A.B|}{||A||}$.

P.S. In general you can see that since always $AX \perp A$ (proof below), there is no hope that $AX$ can cancel out the component of $B$ that is parallel to $A$.

Proof. If $X$ is skew-symmetric, then $<AX, A>= <A, AX^t> =-<A,AX>=-<AX,A>$ so $<AX,A>=0$; that is $AX\perp A$.

0
On

In $\mathcal{M}_{1,n}(\mathbb{R})$, identified with $\mathbb{R}^n$, the usual scalar product is given by $U\cdot V=U\mkern1mu{}^t\mkern-1mu V =V\mkern1mu{}^t\mkern-1mu U$. Let's put $a=\sqrt{A\mkern3mu{}^t\mkern-3mu A}=\Vert A\Vert$. Also let's call $S$ the (real) vector space of all $n\times n$ antisymmetric matrices.
We show that, putting $\color{red}{\frac1{a^2}(\mkern3mu{}^t\mkern-3mu AB-\mkern3mu{}^t\mkern-2mu BA)=X_0}\in S$, then $\min\limits_{X\in S}\Vert AX-B\Vert^2=\Vert AX_0-B\Vert^2$.

First $AX_0=\frac1{a^2} (A\mkern3mu{}^t\mkern-3mu AB-A\mkern3mu{}^t\mkern-2mu BA) =\frac1{a^2} (a^2B-(A\cdot B)A) =B+\omega A$,
where $\omega$ is the real number $-\frac{A\cdot B}{a^2}\cdot$
Second let's observe that, if $H\in S$, then $A\perp AH$: we have $$A\cdot(AH)=A\mkern3mu{}^t\mkern-2mu(AH) =A\mkern3mu{}^t\mkern-2mu H\mkern3mu{}^t\mkern-3mu A =A(-H)\mkern3mu{}^t\mkern-3mu A =-AH\mkern3mu{}^t\mkern-3mu A =-(AH)\cdot A =-A\cdot(AH).$$ This quantity is thus 0.

Now let $X=X_0+H$ be any element of $S$ (so $H$ is antisymmetric); we have \begin{eqnarray*} \Vert AX-B\Vert^2&=&\Vert(AX_0-B)+AH\Vert^2\\ &=&\Vert\omega A + AH\Vert^2\\ &=&\omega^2\Vert A\Vert^2 + \Vert AH\Vert^2\quad \text{(by Pythagoras' theorem)}\\ &\geqslant&\omega^2\Vert A\Vert^2=\Vert AX_0-B\Vert^2. \end{eqnarray*} The expected minimum $\omega^2\Vert A\Vert^2=\frac{(A\cdot B)^2}{\Vert A\Vert^2}$ is thus attained at $X_0$, and also at every matrix $X$ such that $A(X-X_0)=0$ (they form an affine subspace of $S$).

P. S.: thanks to @MathWonk for detecting immediately a dreadful error in my first (deleted…) answer.

0
On

$ \def\l{\lambda} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\qif{\quad\iff\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} $Let's use uppercase latin to denote matrices, lowercase latin for column vectors, and greek letters for scalars. Using this convention the original problem variables become $(A,B)\to(a^T,b^T)$

Then for typing convenience, define the vector $$\eqalign{ w &= X^Ta-b \qif w^T = a^TX-b^T \\ }$$ Now write the objective function and calculate its gradient $$\eqalign{ \phi &= w^Tw \\ d\phi &= 2w^Tdw = 2w^TdX^Ta \\ \grad{\phi}{X} &= 2aw^T \;=\; G \quad\big({\rm gradient}\big) \\ }$$ The constraint $X=\fracLR{X-X^T}2$ means that the constrained gradient is likewise skewed $$\eqalign{ H &= \fracLR{G-G^T}2 \\ &= aw^T-wa^T \\ &= \LR{aa^TX - ab^T} - \LR{X^Taa^T - ba^T} \\ &= \LR{aa^TX + Xaa^T} - \LR{ab^T - ba^T} \\ }$$ The zero-gradient condition $(H=0)\,$ yields $$\eqalign{ aa^TX + Xaa^T = ab^T - ba^T \\ }$$ Note that the RHS is a skew matrix. Might $X$ be a scalar multiple of it? $$\eqalign{ &X = \l\LR{ab^T-ba^T} \\ &\l aa^T\LR{ab^T-ba^T} + \l\LR{ab^T-ba^T}aa^T = \LR{ab^T - ba^T} \\ &\l\LR{a^Ta}\LR{ab^T-ba^T} = \LR{ab^T - ba^T} \qiq \l = \fracLR{1}{a^Ta} \\ }$$ Our ansatz was correct, therefore $$X = \fracLR{ab^T-ba^T}{a^Ta}$$