How to minimize this quartic function?

1.1k Views Asked by At

I need to find the vector that minimizes this matrix equation;

$$\bar{v}.M_1.v-(\bar{v}.M_2.v)^2 $$

$v$ is a normalized complex vector and $\bar{v}$ is the complex conjugate but I can arrange parameters in such a way that it can be all real. $M_1$ and $M_2$ are Hermitian matrices and therefore the function I am asking is eventually a real number.

My idea was to rewrite the vector in terms of coefficients and then calculate the product and I get a quartic polynomial but solving it in higher dimensions is a mess.

So I thought maybe it is possible to use some matrix techniques which always simplifies things. Approximations are also accepted.

Also is there calculus techniques such that I can take derivative of this function and find the optimum vector?

4

There are 4 best solutions below

2
On

If you didn't have normalization, this would be exactly solvable in the n=3 case by Sums of Squares methods.

In general this is hard, but there exist approximation algorithms with $1 - Cn^{-2}$ approximation ratios for quartic optimization with quadratic constraints (exactly what you have here).

As far as an explicit form for this problem, I'll write $A=M_1$, $B=M_2$, and $x=v$. The Lagrangian becomes $$L(x,\lambda)=x^* A x - (x^* B x)^2 + \lambda(1 - x^* x)$$ Taking the derivative says that at optimality $$ Ax - 2B x x^* B x - \lambda x=0\text{, or } (A - 2Bxx^*B - \lambda I)x = 0 $$ and, necessarily, $x^*x=1$. As far as I can tell this isn't easily solved.

2
On

Yes, there is a calculus for functions defined in terms of vector and matrix operations. It takes advantage of the algebraic rules for those operations (which is good), but care must be taken because of the noncommutativity of matrix multiplication.

Here's an example. If $x$ is a real variable, $a$ is a constant, and $f(x) = ax^2$, then $f'(x) = 2ax$. if $\mathbf{x}$ is a vector variable, and $A$ is a constant square matrix, then the analogous quadratic function is $f(\mathbf{x}) = \mathbf{x}\cdot A \mathbf{x} = \mathbf{x}^T A \mathbf{x}$. We can express the derivative of $f$ in terms of $\mathbf{x}$.

If the $k$-th component of $\mathbf{x}$ is $x_k$, and the $(i,j)$-entry of $A$ is $a_{ij}$, then $$ f(\mathbf{x}) = \sum_{i,j=1}^n x_i a_{ij} x_j $$ So for each $k$, \begin{align*} \frac{\partial f}{\partial x_k} &= \sum_{i,j=1}^n \frac{\partial}{\partial x_k}(x_i a_{ij} x_j) = \sum_{i,j=1}^n \left(\delta_{ik} a_{ij} x_j + x_i a_{ij} \delta_{jk}\right) \\ &= \sum_{j=1}^n a_{kj}x_j + \sum_{i=1}^n a_{ik}x_i \end{align*} The first summation is equal to the $k$-th component of $A\mathbf{x}$. The second summation is equal to the $k$-th component of $A^T\mathbf{x}$. So we can say: $$ \frac{\partial}{\partial \mathbf{x}}(\mathbf{x}^T A \mathbf{x}) = A \mathbf{x} + A^T \mathbf{x} $$ If $A$ is symmetric, then $\frac{\partial}{\partial \mathbf{x}}(\mathbf{x}^T A \mathbf{x}) = 2A\mathbf{x}$, which is pretty close to $\frac{d}{dx}(ax^2) = 2ax$.

You can read more about this kind of calculus on Wikipedia. The thing is, you almost need to work it out in coordinates first, in order to convince yourself that the matrix algebra shortcuts are valid.

3
On

Hint

Note $f(v)=\bar{v}.M_1.v-(\bar{v}.M_2.v)^2$ the function. Your target is to find $v$ such that $f^\prime(v)=0$, where $f^\prime$ is the Fréchet derivative.

To do that you need to use heavily the chain rule writing $f$ as a composition of functions.

For example $\varphi(v)= \bar{v}.M_1v=\varphi_1(\varphi_2(v),v)$ where $\varphi_2(v)=\bar{v}$ and $\varphi_1(u,v)=u.M_1.v$.

Now using Fréchet derivatives,

$$\begin{cases}\varphi_2^\prime(v).h= \bar{h}\\ \varphi_1^\prime(u,v).(h,k)= h.M_1.v+ \bar{u}.M_1k \end{cases}$$ and chain rule

$$\varphi^\prime(v).h=\bar{h}.M_1v +\bar{v}.M_1.h=2\bar{h}.M_1.v$$ using the fact that $M_1$ is hermitian.

You can then follow a similar path for the function $\psi(v)=(\bar{v}.M_2.v)^2$.

And finally use Lagrange multipliers to take into consideration the normalization.

0
On

Given the matrix $X\in{\mathbb C}^{n\times n},\,\,$ define a function $$F(X) = \begin{bmatrix}{\rm Re}(X) & -{\rm Im}(X)\\{\rm Im}(X) & {\rm Re}(X) \end{bmatrix}\in{\mathbb R}^{2n\times 2n}$$

Similarly, given the vector $x\in{\mathbb C}^{n},\,\,$ define a function $$f(x) = \begin{bmatrix}{\rm Re}(x)\\{\rm Im}(x)\end{bmatrix}\in{\mathbb R}^{2n}$$

Using these functions, your complex problem can be transformed into a real problem, which makes it easier to calculate gradients.

Define some new, real variables $$\eqalign{ A &= F(M_1),\,\,\,B = F(M_2),\,\,\,x = f(v) \cr \alpha &= v^*M_1v = \tfrac{x^TAx}{x^Tx} \cr \beta &= v^*M_2v = \tfrac{x^TBx}{x^Tx} \cr }$$ Since the matrices $(M_1,M_2)$ are hermitian, the matrices $(A,B)$ are symmetric. Also note that $v$ was constrained to be a unit vector, whereas $x$ is unconstrained.

Start by finding the differential and gradient of $\alpha$ $$\eqalign{ d\alpha &= \frac{(2Ax)^Tdx}{x^Tx} - \frac{(x^TAx)(2x^T)dx}{(x^Tx)^2} \cr &= \frac{2(Ax-\alpha x)^Tdx}{x^Tx} \cr \frac{\partial\alpha}{\partial x} &= \tfrac{2(A-\alpha I)x}{x^Tx} \cr }$$ Similarly, the gradient of $\beta$ is $\frac{\partial\beta}{\partial x} = \frac{2(B-\beta I)x}{x^Tx}$

The gradient of your target function can be calculated as $$\eqalign{ \phi &= \alpha-\beta^2 \cr \frac{\partial\phi}{\partial x} &= \frac{\partial\alpha}{\partial x} - 2\beta\frac{\partial\beta}{\partial x} = \frac{2\,\big(Ax-2\beta Bx + 2\beta^2x-\alpha x\big)}{x^Tx} \cr }$$ Now use your favorite gradient-based unconstrained minimization method (e.g. Barzilai-Borwein) to solve for $x$ -- and remember that $(\alpha,\beta)$ are not constants, but are functions of $x$.

Once you have $x$, you can recover the $v$ vector which solves the original complex problem.

Here's some Julia code

function q(A,x); return sum(x'*A*x); end        # quadratic form
function normF(x); return sqrt(sum(x'*x)); end  # frobenius norm

# objective function and gradient
function     F(A,B,x); return q(A,x) - q(B,x)^2; end
function gradF(A,B,x)
  a,b,c = q(A,x), q(B,x), q(one(A),x)
  return (2/c) * ((A*x-a*x) - 2*b*(B*x-b*x))
end

# Barzilai-Borwein
function bbSolve(tol,nMax,A,B,x)
   # initialize
   g=x.+0; dx=x*0;  dg=x*0; xBest=x.+0; g=gradF(A,B,x); vBest=v=normF(g)
   n=0; b=1e-5; dx=b*(x.+b); dg=g.+0; g=gradF(A,B,x-dx); dg -= g

   # iterate
   while true
       n += 1
       b = sum(dg'*dx) / sum(dg'*dg)  # barzilai steplength
       dx = -b*g; x += dx
       dg = -g; g=gradF(A,B,x); dg += g
       v = normF(g)

       if v < vBest
          vBest = v; xBest = x
       end

       if isnan(v) || isinf(v)
          break
       elseif v < tol || n > nMax
          break
       end
   end
   return n, xBest
end

# test with randomly generated matrices
n=5;    x=ones(2*n,1);     # initial guess x = 1
M1=rand(n,n)+im*rand(n,n); M1+=M1';  # hermitian
M2=rand(n,n)+im*rand(n,n); M2+=M2';
A=[real(M1) -imag(M1); imag(M1) real(M1)];
B=[real(M2) -imag(M2); imag(M2) real(M2)];
@time k,x = bbSolve(1e-14, 200, A,B,x);
@printf( "k, F(x), |x|, x  =  %s\n",  (k, F(A,B,x), normF(x), x) );