It is known that the permutation matrices lie at the intersection of the orthogonal group $\mathbb{O}^N$ with the Birkhoff polytope $\mathbb{DS}^N$. It is also known that any non-negative matrix $X\in\mathbb{R}^{N \times N}$ can be projected onto $\mathbb{DS}^N$ by the Sinkhorn algorithm. Finally, any matrix in $\mathbb{O}^N$ has to be a permutation matrix, if all the entries are positive (matrix is non-negative). This also means that, any orthogonal matrix $O$ will contain negative entries if it is not a permutation.
How can I project an orthogonal matrix, which is allowed to contain negative entries onto $\mathbb{DS}$?
I see that this is a relatively uncharted territory. My answer to this will be Randomized Rounding [1]. I am not sure whether the cited resource completely describes the algorithm I came up with, so I will explain it here:
Let $\mathbf{O}$ denote the input orthogonal matrix, which we wish to project onto a point on the Birkhoff Polytope $\mathbf{X} \in \mathcal{DS}$. Let $(\mathbf{x}, \mathbf{y})$ be vectors in $\mathbb{R}^d$ with $\mathbf{y}=\mathbf{O}\mathbf{x}$ containing distinct coordinates. Let $\mathbf{x}$ be sampled at random from the standard Gaussian measure on $\mathbb{R}^d$. Barvinok observed that the action of a given orthogonal matrix on a random vector $\mathbf{x}$ with high probability is very close to a permutation of the coordinates:
The rounding of $\mathbf{O}$ at $\mathbf{x}$ is defined as the permutation $\mathbf{P}_{(\mathbf{O},\mathbf{x})}=\sigma(\mathbf{O}, \mathbf{x})$ s.t. $\mathbf{P}_{(\mathbf{O},\mathbf{x})}$ matches the $k^{th}$ smallest coordinate of $\mathbf{x}$ with the $k^{th}$ smallest coordinate of $\mathbf{y}=\mathbf{O}\mathbf{x}$. Yet this permutation varies with $\mathbf{x}$. To arrive at the $\mathbf{x}$-invariant projection onto $\mathcal{DS}_d$, one can easily construct small approximate non-commutative convex combinations using the Birkhoff von Neumann theorem [2]: \begin{equation} (\mathbf{X} \in \mathcal{DS}_d) =\frac{1}{T}\sum\limits_i^{T}\mathbf{P}_{(\mathbf{O},\mathbf{x}_i)} \end{equation} where $T$ is the number of iterates - the higher the more accurate. Finally, the permutation matrix corresponding to $\mathbf{X}\in\mathcal{DP}_d$ can be found via the famous Hungarian algorithm.
I am providing a sample MATLAB code of the procedure summarized above:
References
[1] Approximating orthogonal matrices by permutation matrices, Alexander Barvinok, 2005
[2] A SHORT PROOF OF THE BIRKHOFF-VON NEUMANN THEOREM, GLENN HURLBERT, 2008 http://www.people.vcu.edu/~ghurlbert/papers/SPBVNT.pdf