We are working with discrete optimal transport.
Let $P$ be a matrix and let $H(P) =- \sum_{i,j} P_{i,j} (\log(P_{i,j})-1)$.
Let $C$ be the cost matrix. And $\langle C,P\rangle$ the Frobenius inner product.
We introduce the regularized optimal transport problem $\min_{P \in U(a,b)} \langle C,P\rangle + \epsilon H(P)$.
We want to prove that as $\epsilon \to 0$, $P_\epsilon$ converges to an optimal solution to the original Kantrovich problem with maximal entropy.
I understand the proof up to the point where it says for any subsequence of $P_\epsilon$, we can choose a sub-subsequence of it that converges to an optimal transport plan with maximum entropy.
Question 1) The part I don't get is when it says by strict convexity of $-H$, we get $P^* = P_0^*$. It is clear that $-H$ is strictly convex, but you still need $-H$ on a convex set. It seems we are only looking at optimal points in the Kantrovich problem, which is not a convex set.
Question 2) It says that as $\epsilon \to \infty$, $P$ gets less sparse, but I would have thought the opposite since more entropy means more uncertainties.
Thank you!





Let $\mathcal{X}$ be the discrete set for which $a,b$ are measures on.
With respect to your first question, consider the sequence $\epsilon_l\to 0$, $\epsilon_l>0$. From what I gather you can see that for some subsequence (which for clarity I will relabel) $\lim P_{\epsilon_{l_{k}}}$ exists and we call it $P^*$. Moreover, the limit is such that $P^*=\text{argmin}\{-H(P)~:~P\in U(a,b),\langle P,C\rangle=L_C(a,b)\}.$ It is also clear that $-H$ is strictly convex.
Now $U(a,b)$ is a convex set, indeed let $P,Q \in U(a,b)$ and let $\mathcal{Y}$ be some measurable subset of $\mathcal{X}$, then $\lambda P(\mathcal{X}\times \mathcal{Y})+(1-\lambda) Q(\mathcal{X}\times \mathcal{Y})=\lambda b(\mathcal{Y})+(1-\lambda) b(\mathcal{Y})=b(\mathcal{Y})$, we can do the same for the other marginal. Hence, the limit to the full sequence $\lim P_{\epsilon_l}$ is unique, we call it $P_0^*$. But since $\epsilon_{l_{k}}$ is a subsequence of $\epsilon_{l}$ is must be that that $P^*=P_0^*$.
For question 2), the sparsity of a matrix is describes the number of zeros it has. The more zeros the more sparse. Adding entropy is like adding diffusion (or for PDE a Laplacian). It blurs the optimal transport plan, spreading out mass, hence decreasing the number of zeros in the optimal plan i.e it becomes less sparse.