Constrained optimization cannot find null space

948 Views Asked by At

I try to solve for the minimization with constraints on x. \begin{align} \min_{x}\,\,&\frac{x^TAx}{x^Tx},\\ \mbox{s.t.}\,\,& c^Tx=0,\\& \end{align} where $A$ is symmetric but may not be positive definite.

I used Lagrangian multiplier to solve the question like here. I followed that wiki page on equality constraints and formed the matrix equation. The right hand side is just zero, but just cannot find the solution of the matrix equation:

$$\begin{bmatrix}2A & c \\ c^T&0\end{bmatrix}\begin{bmatrix}x\\\lambda\end{bmatrix}=\begin{bmatrix}0\\ 0\end{bmatrix}$$

It seems this matrix never has a null space?

1

There are 1 best solutions below

0
On BEST ANSWER

This answer is mostly a summary of the comments to the question. You want to solve

\begin{align*} \min_{x\in\mathbb{R}^{n}}\frac{x^T A x}{x^T x} \end{align*}

subject to the constraint

\begin{align*} c^Tx &= 0. \end{align*}

If $x$ is a solution of this problem then $\lambda x$ with $\lambda\in\mathbb{R}\setminus\{0\}$ is also a solution. Only the direction of $x$ is relevant but not its length. Such we can also restrict ourselves to the representative $x$ with unit length $x^Tx=1$.

User copper.hat proposed to project the matrix $A$ to the orthogonal complement of $c$ and to solve the unconstrained minimization of the Rayleigh quotient for the projected matrix. The solution of this minimization problem is just an eigenvector corresponding to the smallest eigenvalue of the projected matrix. This method is implemented in the following Octave routine cstrEigen1. The code is quite short and self-descriptive. Beneath the code for cstrEigen1 there is some additional code for testing.

The projection method is certainly the method of choice for small and medium sized problems.

For large problems iterative schemes might be better. The normal equations for the problem are

\begin{align*} F(x,\lambda_1,\lambda_2) &= 0 \end{align*}

with

\begin{align*} F(x,\lambda_1,\lambda_2) &= \begin{pmatrix} 2Ax+ c\lambda_1 + 2x\lambda_2\\ c^Tx\\ x^Tx-1 \end{pmatrix} \end{align*}

Newton's algorithm leads to a first formula how the new iterated $\bar x,\bar \lambda_1, \bar\lambda_2$ can be derived from the old iterated $x,\lambda_1,\lambda_2$.

\begin{align*} 0 &= F(x,\lambda_1,\lambda_2) + F'(x,\lambda_1,\lambda_2) \begin{pmatrix} \bar x - x\\ \bar \lambda_1-\lambda_1\\ \bar \lambda_2-\lambda_2 \end{pmatrix}\\ &= \begin{pmatrix} 2Ax+ c\lambda_1 + 2x\lambda_2\\ c^Tx\\ x^Tx-1 \end{pmatrix} + \begin{pmatrix} 2A& c& 2x\\ c^T& 0& 0\\ 2x^T& 0& 0 \end{pmatrix} \begin{pmatrix} \bar x - x\\ \bar \lambda_1-\lambda_1\\ \bar \lambda_2-\lambda_2 \end{pmatrix}\\ &= \begin{pmatrix} 2A\bar x + c\bar\lambda_1 + 2x\bar\lambda_2\\ c^T\bar x\\ 2x^T\bar x - 1 - x^T x \end{pmatrix} \end{align*}

This can be written as a linear system for the new iterated.

\begin{align*} \begin{pmatrix} 2A& c& 2x\\ c^T& 0& 0\\ 2x^T& 0& 0 \end{pmatrix} \begin{pmatrix} \bar x\\ \bar\lambda_1\\ \bar\lambda_2 \end{pmatrix} &= \begin{pmatrix} 0\\ 0\\ 1+x^Tx \end{pmatrix} \end{align*}

A slight modification is to use the information $x^T x=1$ in advance and normalize $x$ before using it in the matrix equation, i.e., $x:=\frac{x}{|x|}$.

With this normalized $x$ the above equation reads

\begin{align*} \begin{pmatrix} 2A& c& 2x\\ c^T& 0& 0\\ 2x^T& 0& 0 \end{pmatrix} \begin{pmatrix} \bar x\\ \bar\lambda_1\\ \bar\lambda_2 \end{pmatrix} &= \begin{pmatrix} 0\\ 0\\ 2 \end{pmatrix} \end{align*}

(The only difference is the 2 at the end.)

The following list represents a pseudo-program for the algorithm with precedental re-normalization and step-size limitation.

  1. Choose a random initial guess $x$
  2. Normalize $x:=\frac{x}{|x|}$ (omitted for pure Newton)
  3. Solve the matrix equation $\begin{pmatrix}2A&c&2x\\c^T&0&0\\2x^T&0&0\end{pmatrix}\begin{pmatrix}\bar x\\\lambda_1\\\lambda_2\end{pmatrix}=\begin{pmatrix}0\\0\\2\end{pmatrix}$. You are done when $\bar x == x$.
  4. Limit the step size $s=\min(0.1,|\bar x - x|)$ and update $x := x + s\frac{\bar x - x}{|\bar x - x|}$. Goto step 2.

Newton's algorithm with and without precedental re-normalization are implemented in the following Octave/Matlab script together with some code for testing. EDIT: I have also included projected inverse iteration into the code. Further below there is an outline of this algorithm.

## https://math.stackexchange.com/questions/2054141/constrained-optimization-cannot-find-null-space#comment4217410_2054141
## Usage: [xNew,iter] = cstrEigen(A,c,x,tol,maxIter,method)
## Iterative algorithm for minimizing x^TAx subject to the constraints c^Tx=0 and x^Tx=1.
## @param[in] A system matrix
## @param[in] c normal vector for the linear constraint
## @param[in] x initial guess
## @param[in] tol tolerance for the angular deviation of the solution (defaults to 1e-8)
## @param[in] maxIter maximal number of iterations (defaults to 100)
## @param[in] method
##            1: Newton's method (with damping),
##            2: modified Newton's method with damping and precedental normalization (defaults to 2),
##            3: inverse iteration with shift
## @return xNew solution
## @return iter number of iterations (if iter == maxIter the algorithm did probably not converge)
function [xNew,iter] = cstrEigen(A,c,x,tol,maxIter,method)
  n = size(A,1);
  if ~exist("tol","var")
    tol = 1e-8;
  end
  if ~exist("maxIter","var")
    maxIter = 100;
  end
  if ~exist("method","var")
    method=2;
  end
  x = x - c*(c'*x) / (c'*c);
  x = x/norm(x);
  if method == 3 # inverse iteration
    ## We just startup with power iteration to get an upper
    ## bound for the largest modulus of the eigenvalues.
    xNew = x/norm(x);
    for iter=1:maxIter
      xOld = xNew;
      xNew = A*xOld;
      lambda = xNew'*xOld;
      xNew = xNew/norm(xNew);
      if norm(xNew-xOld) < 0.01 || norm(xNew+xOld) < 0.01
    break;
      end
    end
    ## initial guess for the shift:
    lambda = -1.1*abs(lambda);
    for iter=1:maxIter
      ## inverse projected iteration
      ## xNew = [A-lambda*eye(n,n),c;c',0]\[x;0];
      [xNew,~,~,~,rank,kernel]=luRook([A-lambda*eye(n,n),c;c',0],[x;0]);
      if rank <= n
    xNew = kernel(1:n,1);
      else
    xNew = xNew(1:n);
      end
      lambda = lambda + xNew'*x/(xNew'*xNew);
      xNew = xNew/norm(xNew);
      if rank <= n || norm(xNew-x) < tol
    break;
      end
      x = xNew;
    end
  else
    for iter=1:maxIter
      if method == 2 # Newton's method with preceda
    x = x/norm(x);
    res2 = 2;
      else # Newton's method
    res2 = 1+x'*x;
      end;
      xNew = [A,c,2*x;c',0,0;2*x',0,0]\[zeros(n+1,1);res2];
      xNew = xNew(1:n);
      dx = xNew-x;
      dxNorm = norm(dx);
      if dxNorm <= tol
    break;
      end;
      x = x + min(0.1,dxNorm)*dx/dxNorm;
    end
  end
endfunction

## Usage: [c]=rayleigh(A,x)
## @input A system matrix
## @input x eigen vector approximation
## @return c eigen value approximation (Rayleigh-quotient)
function [c]=rayleigh(A,x)
  c = (x'*A*x)/(x'*x);
endfunction

## Usage: [x,LU,pr,pc,rank,kernel,res]=luRook(A,y=[],absTol=1e-30,relTol=eps,pr,pc,rank)
## LU-decomposition with Rook pivoting
## @param[in] A system matrix
## @param[in] y right-hand side
## @param[in] absTol absolute tolerance for detecting singular rows
## @param[in] relTol relative tolerance
## @param[in] pr row permutation, A already decomposed if pr is given
## @param[in] pc column permutation, A already decomposed if pc is already given
## @param[in] rank
## @return x solution of A*x = y
## @return LU L: lower left triangle without the diagonal U: upper right triangle including the diagonal
## @return pr row permutation
## @return pc column permutation
## @return rank 
## @return kernel, base for space x≠0 with A*x = 0
## @return res Residual = A*x-y
function [x,LU,pr,pc,rank,kernel,res]=luRook(A,y=[],absTol=1e-30,relTol=100*eps,pr,pc,rank)
  nr = size(A,1);
  nc = size(A,2);
  n = min(nr,nc);
  ny = size(y,2);
  yOrig = y;
  if ~exist("rank","var")
    rank = n;
  end
  if exist("pr","var")
    ## forward substitution
    LU = A;
    if isargout(7)
      ## Reconstruction of the original matrix from L and U
      ## for the calculation of the residual.
      A(pr,pc) = (tril(LU(:,1:n),-1)+eye(nr,n))*triu(LU(1:n,:));
    end
    y = y(pr,:);
    for i=2:rank
      y(i,:) -= LU(i,1:(i-1))*y(1:(i-1),:);
    end
  else
    maxA = 0;
    rank = 0;
    LU = [A,y];
    pr = 1:nr;
    pc = 1:nc;
    for i=1:n
      ## Search for first non-zero element
      firstNonzeroCol = 0;
      for col=i:nc
    [ma,rowPiv] = max(abs(LU(i:nr,col)),[],1);
    rowPiv += i-1;
    if ma > absTol + relTol*maxA
      firstNonzeroCol = col;
      break;
    end
      end
      if ~firstNonzeroCol
    break;
      end
      ++rank;
      colPiv = firstNonzeroCol;
      do
    colPivOld = colPiv;
    [ma,colPiv] = max(abs(LU(rowPiv,firstNonzeroCol:nc)),[],2);
    colPiv += firstNonzeroCol-1;
    if colPiv == colPivOld
      break;
    end
    [ma,rowPiv] = max(abs(LU(i:nr,colPiv)),[],1);
    rowPiv += i-1;
      until 0;
      if ma > maxA
    maxA = ma;
      end
      ## Row and column exchange:
      if i ~= rowPiv
    LU([i,rowPiv],:) = LU([rowPiv,i],:);
    pr([i,rowPiv]) = pr([rowPiv,i]);
      end
      if i ~= colPiv
    LU(:,[i,colPiv]) = LU(:,[colPiv,i]);
    pc([i,colPiv]) = pc([colPiv,i]);
      end
      ## LU-Transformation step
      piv = LU(i,i);
      LU((i+1):end,i) /= piv;
      LU((i+1):end,(i+1):end) -= LU((i+1):end,i)*LU(i,(i+1):end);
    end
    y = LU(:,(nc+1):end); # transformed right-hand side
    LU = LU(:,1:nc);
  end
  ## Back-substitution
  if size(y,2) > 0 || isargout(6)
    if isargout(6) # kernel
      ## UReg*xReg + USing*xSing = 0
      ## UReg*xReg = -USing*xSing = -LUSing with xSing = id
      defect = nc - rank;
      nRHS = size(y,2) + defect;
      y = [y,-LU(:,(rank+1):nc)];
    else
      nRHS = ny;
    end
    x = zeros(nc,nRHS);
    for i=rank:(-1):1
      x(i,:) = (y(i,:) - LU(i,(i+1):rank)*x((i+1):rank,:))./LU(i,i);
    end
    kernel(pc,:) = [x(1:rank,ny+1:end);eye(nc-rank)]; # empty array if kernel is not an output
    x = x(:,1:ny);
    x(pc,:) = x;
  else
    x = [];
  end
  if isargout(7)
    res = A*x-yOrig;
  end
endfunction

## Test for cstrEigen.
## The parameters are defined at the beginning of the function.
## There you can modify them when needed.
## The output seed for the random number generator
## can be reused to compare both algorithms.
function test_cstrEigen()
  method = 3;
  nTest = 10;
  nMin = 1;
  nMax = 1000;
  maxIter = 200;
  tol = 1e-8; ## Tolerance for stopping iteration
  delta = 0.01; ## Relative disturbation of the eigenvalue for checking the minimality
  if 1
    seed = rand("state");
  elseif 0
    rand("state",8504035269840);
  end

  printf("seed=");
  printf("%d,",seed);

  for i=1:nTest
    n = randi([nMin,nMax]);
    A = rand(n,n);
    A = A+A';
    c = rand(n,1);
    x = rand(n,1);
    [x,iter] = cstrEigen(A,c,x,tol,maxIter,method);
    printf("Test %d, n=%d, iter=%d\n",i,n,iter);
    if iter == maxIter
      printf("Test failed.\n");
    end
    ## Testing orthogonality
    if abs(c'*x) > tol
      printf("abs(c'*x)= %g > %g = tol\n", abs(c'*x), tol);
    end
    ## Testing minimality
    dx = delta*norm(x)*rand(n,1);
    dx = dx-c*(c'*x)/(c'*c);
    lambdaMin = rayleigh(A,x);
    if rayleigh(A,x+dx) < lambdaMin || rayleigh(A,x-dx) < lambdaMin
      printf("Not smallest eigenvalue.");
    end
  end
endfunction

Test result generated by Newton's method with precedental re-normalization (and step-size limitation):

seed=8504035269840
Test 1, n=158, iter=20
Test 2, n=196, iter=200
Test failed.
Test 3, n=721, iter=30
Test 4, n=467, iter=30
Test 5, n=268, iter=23
Test 6, n=314, iter=200
Test failed.
Test 7, n=122, iter=22
Test 8, n=881, iter=29
Test 9, n=365, iter=102
Test 10, n=922, iter=23

Test result with Newton's method only with step-size limitation: This result is generated with the same seed for the random number generator.

Test 1, n=158, iter=104
Test 2, n=196, iter=129
Test 3, n=721, iter=200
Test failed.
Test 4, n=467, iter=200
Test failed.
Test 5, n=268, iter=143
Test 6, n=314, iter=200
Test failed.
Test 7, n=122, iter=91
Test 8, n=881, iter=200
Test failed.
Test 9, n=365, iter=200
Test failed.
Test 10, n=922, iter=200
Test failed.

As one sees Newton's algorithm with precedental re-normalization performs better than without.

Other methods mentioned in the comments are:

  • (user copper.hat): projected gradient method
  • projected inverse matrix iteration with shift for finding the eigenvector corresponding to the smallest eigenvalue of the projected matrix

In my opinion projected inverse matrix iteration with shift looks most promising. Maybe, I will try it when I find some spare time.

EDIT: Now, also projected inverse matrix iteration is included in the code. This method is reliable and performs best for larger systems. The inverse iteration with shift exploits that the system becomes singular. In the singular case we do not need the minimum norm solution (which is returned by most Octave solvers) but the kernel of the system matrix. I've included a linear solver luRook which returns the kernel if the system becomes singular.

Output for the test from the above script with method in test_cstrEigen set to 3 (projected inverse iteration):

Test 1, n=670, iter=8
Test 2, n=427, iter=6
Test 3, n=25, iter=7
Test 4, n=626, iter=8
Test 5, n=746, iter=8
Test 6, n=712, iter=7
Test 7, n=78, iter=7
Test 8, n=803, iter=8
Test 9, n=194, iter=8
Test 10, n=491, iter=6

There follow some notes on how this works:

The inverse matrix iteration for a matrix $M$ with shift starts with some approximation $\lambda$ for the smallest eigenvalue of $M$ and some initial guess $y$ for the corresponding eigenvector. The system \begin{align*} (M-\mathbf1\lambda)\cdot \bar y = y \end{align*} is solved for $\bar x$. The new iterate $\bar\lambda$ for the eigenvalue can be calculated exploiting the previous equation: \begin{align*} \frac{\bar y^T y}{\bar y^T \bar y} = \frac{\bar y^T M \bar y}{y^T y} - \lambda \end{align*} The Rayleigh quotient in this equation is used as new estimate for the eigenvalue, i.e.: \begin{align*} \lambda := \lambda + \frac{\bar y^T y}{\bar y^T \bar y} \end{align*} The new estimate for the eigenvalue is obtained by re-normalizing $x$: \begin{align*} y := \frac{\bar y}{|\bar y|} \end{align*} The special aspect for our task is that we use a projection of $A$ onto $\operatorname{ker}c^T$ as matrix $M$. Instead of building an orthonormal base transformation $Q$ with $Q(:,1)=\frac{c}{|c|}$ and using $B:=Q(:,2:\operatorname(end))$ as projector such that $M = B^TAB$ we can also iterate with the original matrix by projecting $x$ onto the orthogonal complement of $c$. That means we iterate with $y := Bx$ instead of $x$. In the (inverse) power iteration there is the projector $P:=BB^T=\mathbf1-\frac{cc^T}{c^Tc}$ in between two applications of $M$: \begin{align*} M^k y_k &= y_0\\ B^T A \underline{B\cdot B^T} A B\cdot \ldots \cdot B^T A B y_k &= y_0 \end{align*} We multiply by $B$, substitute $P=BB^T$ as well as $y = Bx$, and obtain \begin{align*} B\cdot M^k y_k = BB^T A B\cdot B^T A B\cdot \ldots \cdot B^T A B y_k &= B y_0\\ P A\cdot \ldots\cdot P A x_k &= x_0. \end{align*} This system has to be solved under the restriction that $y_k$ lies within the image of $P$, i.e. $c^T \cdot x_k=0$, because of $y_k = PAy_{k+1}$. For reaching that goal we have one additional degree of freedom for which we introduce a new variable $\gamma_k$, namely the $c$-component of $Ay_k$ which is nullified by $P$. This leaves us with the following system \begin{align*} \begin{pmatrix} A&-c\\ -c^T&0 \end{pmatrix} \begin{pmatrix} x_k\\ \gamma_k \end{pmatrix} &= \begin{pmatrix} x_{k-1}\\ 0 \end{pmatrix} \end{align*} to be solved for the iterate $y_k$.

A shift in the system for $x_k$ translates directly into a shift into the system for $y_k$: \begin{align*} (B^T A B - \lambda\mathbf1)y_k &= y_{k-1}\\ P A x_k -\lambda\mathbf1 x_k &= x_{k-1} \end{align*} The estimation for the new shift in the projected system can also directly be translated into a shift in the original system: \begin{align*} \lambda_{k+1} &= \lambda_k + \frac{y_k^T y_{k-1}}{y_k^T y_k}\\ &=\lambda_k + \frac{x_k^TB B^Ty_{k-1}}{x_k^TB B^T x_k}\\ &=\lambda_k + \frac{x_k^T P x_{k-1}}{x_k^T P x_k}\\ &=\lambda_k + \frac{x_k^T x_{k-1}}{x_k^T x_k} \end{align*}

Thereby the last identity holds since the $x_k$ lie in the image of the projector $P$.