Relaxing the elements of a matrix

106 Views Asked by At

I try to understand a specific part of the paper "Consistent shape maps via semidefinite programming", where a binary symmetric Input matrix $X^{in}$ is given with $X^{in} \in \{0,1\}^{nm \times nm}$ and an output matrix $X$, which has to fulfill certain constraints, should be found. They introduce an objective function $f$, summing up the element wise L1-norms between the input maps and output maps, where $X_{i,j}^{(in)} \in \{0,1\}^{m \times m}$ are again matrices:
$$ f=\sum_{i,j} \|X_{i,j}^{in}-X_{i,j}\| $$ Now they rewrite $f$ and formulate out of this an optimization problem (which they then solve with convex relaxation). My problem is to understand why they introduce the rewriting with the following sentence: " As $X_{i,j}^{in}$ are binary matrices, it turns out whenever the elements of $X_{i,j}$are between $0$ and $1$ (i.e. when they are relaxed), we can rewrite $f$ as a linear function over $X$: $$ f=\sum_{i,j}(\sum_{X_{i,j}^{in}(s,s')=0}X_{i,j}(s,s') + \sum_{X_{i,j}^{in}(s,s')=1}(1-X_{i,j}(s,s')) ) $$ So they allow the output matrix to take values between $0$ and $1$, but I don't get why this is needed! I would be happy about any kind of help, thanks a lot!

1

There are 1 best solutions below

0
On

The reason they allow the output matrix to take values between 0 and 1 is simple: the problem is intractable if they do not. Binary programming problems are non-convex, and in the worst case solving them would require examining all $2^{n^2m^2}$ possible $\{0,1\}$ combinations. That quickly becomes intractable for reasonable values of $m$ and $n$.

One approach is to relax the problem by allowing the binary values to take any value in the interval $[0,1]$. The resulting problem is tractable. In some cases, you will find that the answer happens to satisfy the original $\{0,1\}$ requirements anyway. If so, congratulations, you won the lottery! You have found the solution to the original, non-relaxed problem.

In all likelihood, this will not happen; some of the values of $X_{ij}$ will be fractional. In that case, you have a few options:

  • Rounding. You can round those values to the nearest binary value, and accept that answer as a suboptimal solution.
  • Random sampling. Treat these non-integer $X_{ij}$ values as probabilities. Generate a bunch of random matrices using these values as the probability of obtaining a 1. Pick the one that gives the best value of $f$. The solution will still likely be suboptimal, but you're likely to do better than simple rounding. (And if you include the rounded solution among the alternatives, you're guaranteed to do no worse.)
  • Cutting plane/branch and bound/branch and cut. Modern binary programming engines solve a sequence of relaxed problems, with additional constraints inserted to guide the model to a binary solution. These methods will eventually terminate, and often work well in practice, but the problem remains theoretically intractable, so there are no guarantees.

As for why $f$ can be rewritten the way they've done it, that is straightforward. After all, if $y\in[0,1]$, $$x=0\quad\Longrightarrow\quad |x-y|=|-y| = y$$ $$x=1\quad\Longrightarrow\quad |x-y|=|1-y| = 1-y$$ Note that these simple rewrite rules don't actually require the relaxation! They are true even for the original binary problem. Since it's generally good to remove unnecessary nonlinearities from any optimization problem (even simple, polyhedral ones), this is a step worth taking.