$n$-sphere enclosing the Birkhoff polytope

404 Views Asked by At

I am not a mathematician by training, so please feel free to correct my logic or descriptions when necessary:

Let $P$ denote a $\textit{permutation matrix}$: $$ \begin{equation} P := \{X \in \{0,1\}^{n\times n} : X \mathbf{1}_n = \mathbf{1}_n\,,\,\mathbf{1}_n^T X = \mathbf{1}_n^T\}. \end{equation} $$ where $\mathbf{1}_n$ denotes an $n$-dimensional ones vector.

When doing optimizations, to circumvent the discrete nature, such matrices are generally relaxed using the doubly-stochastic matrices $\mathcal{DP}_n$, that live on the Birkhoff Polytope, an $(n - 1)^2$-dimensional convex submanifold of the ambient $\mathbb{R}^{n\times n}$, defined as: $$ \begin{align} \mathcal{DP}_n = \{\,\,X :& X_{ij}>0 \,,\, i,j \in\{1,...,n\} \,\,\, \wedge \sum\limits_{i=1}^n X_{ij}=1 \,\wedge\, \sum\limits_{j=1}^n X_{ij}=1 \,\,\} \end{align} $$ In other words, row and column sums equate to $1$ and entries can take real values between $0$ and $1$.

The permutation matrices live on the vertices of Birkhoff Polytope. One can also describe the discrete set of permutation matrices as the intersection of Birkhoff Polytope $\mathcal{DP}_n$ with the orthogonal group (special case of Stiefel manifold when $m=n$): $$ P=\{X \in \mathcal{DP}_n : X X^T=\mathbf{I}\} $$ The orthogonal group $O(n)$ can be regarded as an $(n(n—1)/2)$-dimensional surface in the $n^2$-dimensional Euclidean space. This surface is a subset of the sphere of radius $n^{1/2}$.

So, the orthogonal group touches the polytope exactly on the permutation matrices. In low-dimensions, one can probably think this as a tessellated shape, approximating the sphere. Or, in the other way around, we can speak of an encapsulating sphere of the polytope (though for higher dimensions this might not be true).

Now comes my question: As $n \rightarrow \infty$, does the sphere become a good approximation of the polytope, or does the gap grow? In other words, instead of using the Birkhoff polytope itself as a relaxation of the discrete permutations, can we live with using just the sphere? And, if possible, would it be possible to bound the error of approximation?

Thanks!

1

There are 1 best solutions below

8
On BEST ANSWER

The answer to the question literally asked is $\infty$. I will argue that the answer to the question the OP probably meant to ask is $(n-1)$. I'll write $J$ for the $n \times n$ matrix of all $1$'s, and $\vec{j}$ for the length $n$ vector of all $1$'s.

Let $D$ be the affine linear space of $(n-1) \times (n-1)$ matrices whose rows and columns sum to $1$. (So, doubly stochastic but with no condition on the signs.) The Birkhoff polytope $P$ is contained in $D$, whereas the orthogonal group $O(n)$ is not. So, if $\lambda$ is a linear functional which is $0$ on $D$ but not on $O(n)$, then $\lambda$ is $0$ on $P$ but not on $O(n)$, and the ratio of $\max_{X \in O(n)} |\lambda(X)|$ to $\max_{X \in P} |\lambda(X)|$ is infinite. An example of such a $\lambda$ is $\lambda(X) = \sum_{j=1}^n (X_{1j} - X_{2j})$. Another way to put this is that, if we travel on a ray starting at $\tfrac{1}{n} J$ not in the plane $D$, we leave $P$ immediately but stay in the convex hull of $O(n)$ for a nonzero amount of time.

Let $DO(n) = D \cap O(n)$. It seems more natural to ask about the difference between optimizing a linear functional on $P$ and on $DO(n)$. However, $D$ is an affine linear space so we can only speak of affine linear functionals on it, not linear functionals. It seems natural to me to solve this issue by normalizing our functionals to be $0$ on $\tfrac{1}{n} J$, the center of $P$. Another way to phrase our question is, if we travel on a ray from $\tfrac{1}{n} J$ within $D$, how large can the ratio be between the time we leave $P$ and the time we leave $DO(n)$?

Let us see that this ratio can be as large as $n-1$. Consider the line through $\tfrac{1}{n} J$ and $\mathrm{Id}$, and extend it in the opposite direction through $\tfrac{1}{n} J$. So a general point on this line is of the form $\tfrac{1+x}{n} J - x \mathrm{Id}$. This ray leaves $P$ when $x=\tfrac{1}{n-1}$ and the diagonal entries become $0$. However, when $x=1$, this matrix is orthgonal (it is the transformation which fixes $\vec{j}$ and negates $\vec{j}^{\perp}$). So, for all $x \in [-1,1]$, the matrix $\tfrac{1+x}{n} J - x \mathrm{Id}$ is in the convex hull of $DO(n)$. So the ray can be in $DO(n)$ for $n-1$ times as long as in $P$. In terms of optimizing linear functionals, the linear functional $\mathrm{Tr}(JX-nX)$ can only be as large as $n$ on $P$ (achieved when $X$ is a permutation with no fixed points), but is as large as $n^2-n$ at $2\tfrac{1}{n} J - \mathrm{Id}$.

We now prove this ratio is optimal. Suppose that $Z$ is in the convex hull of $DO(n)$. We must show that $\tfrac{n-2}{n-1}J+\tfrac{1}{n-1} Z$ is in $P$. In other words, we must show that each entry of $\tfrac{n-2}{n-1}J+\tfrac{1}{n-1} Z$ is nonnegative. Write $e_i$ for the $i$-th standard basis vector; we must show that $(\tfrac{n-2}{n-1}J+\tfrac{1}{n-1} Z) e_i$ has nonnegative entries. We will use two key properties of $Z$: We have $Z \vec{j} = \vec{j}$, and we have $|Z \vec{w}| \leq |\vec{w}|$ for any vector $\vec{w}$. Both properties are clearly true on $DO(n)$, and preserved by convex combinations.

Write $e_i = \tfrac{1}{n} \vec{j} + \vec{w}$ and put $\vec{x} = Z \vec{w}$. We note that $\vec{w}$ is perpendicular to $\vec{j}$ and has length $\sqrt{(1-1/n)^2 + (n-1) (1/n)^2} = \sqrt{1-1/n}$, so $\vec{x}$ is perpendicular to $\vec{j}$ and has length $\leq \sqrt{1-1/n}$. We have $$\begin{array}{rcl} (\tfrac{n-2}{n-1}J+\tfrac{1}{n-1} Z) e_i &=& (\tfrac{n-2}{n-1}J+\tfrac{1}{n-1} Z) (\tfrac{1}{n} \vec{j}+\vec{w}) \\ &=& \tfrac{1}{n} \vec{j} + \tfrac{1}{n-1} Z \vec{w} \\ &=& \tfrac{1}{n} \vec{j} + \tfrac{1}{n-1} \vec{x} . \\ \end{array}$$ We have used that $J \vec{j} = Z \vec{j} = \vec{j}$ and $J \vec{w}=0$ since $\vec{w}$ is in $\vec{j}^{\perp}$. We want to show that every element of $\tfrac{1}{n} \vec{j} + \tfrac{1}{n-1} \vec{x}$ is nonnegative or, in other words, that each entry of $\vec{x}$ is $\geq -1+1/n$.

Now, $|\vec{x}| \leq \sqrt{1-1/n}$, so each entry of $\vec{x}$ is at most $-\sqrt{1-1/n}$, but we need a better bound. We can get one by using that $\vec{x}$ and $\vec{j}$ are perpindicular. Let $\vec{x} = (x_1, \ldots, x_n)$. So we know $\sum x_j = 0$ and $\sum x_j^2 \leq 1-1/n$. So we have $$x_n^2 \leq 1-\tfrac{1}{n} - \sum_{j=1}^{n-1} x_j^2 \leq 1-\tfrac{1}{n} - \tfrac{1}{n-1} \left( \sum_{j=1}^{n-1} x_j \right)^2 = 1-\tfrac{1}{n} - \tfrac{x_n^2}{n-1}$$ where we have used the root-mean-square inequality in the middle step. Rearranging, $x_n^2 \tfrac{n}{n-1} \leq \tfrac{n-1}{n}$ or $|x_n| \leq \tfrac{n-1}{n}$. So in fact each entry of $\vec{x}$ is $\geq -1+1/n$, as desired.


As a side note, if we instead focus on the intersection of $D$ with the sphere of radius $\sqrt{n}$, the approximation ratio is much worse. Look at the matrix $$\tfrac{1}{n} J + x \begin{bmatrix} -1 & 1/(n-1) & 1/(n-1) & \cdots & 1/(n-1) \\ 1/(n-1) & -1/(n-1)^2 & -1/(n-1)^2 & \cdots & -1/(n-1)^2 \\ 1/(n-1) & -1/(n-1)^2 & -1/(n-1)^2 & \cdots & -1/(n-1)^2 \\ \vdots & & & \ddots & \\ 1/(n-1) & -1/(n-1)^2 & -1/(n-1)^2 & \cdots & -1/(n-1)^2 \\ \end{bmatrix}.$$ This exits the Birkhoff polytope at $x=1/n$ (when the upper left entry becomes $0$) but stays in the sphere of radius $\sqrt{n}$ until $x = \pm \tfrac{(n-1)^{3/2}}{n}$. So the ratio is as bad as $(n-1)^{3/2}$ if we use the sphere instead of the orthogonal group. I also get that this ray exits the convex hull of $DO(n)$ at $x = (n-1)/n$, so this is another case of the $n-1$ ratio.