I’m reading this paper here, and the authors make the following assertion:
“Unfortunately, their algorithm cannot be readily applied to any other OT cost, because the famous Kantorovich duality holds only for W1”
So, they claim that the Kantorovich Duality is valid only for the 1-Wasserstein distance. I think they are actually speaking of the Kantorovich-Rubinstein duality. But is it really only valid for the 1-Wasserstein?
Yes, they are referring to the Kantorovich-Rubinstein duality. The paper they reference involves using neural networks to approximate the Lipschitz potentials.
In general (with suitable assumptions on $c$, especially for powers of the euclidean norm) you have the duality $$ \inf_{\pi \in \Pi(\mu,\nu)} \int_{X \times Y} c(x,y) d \pi = \sup_{\substack{\varphi \in C_0(X) \\ \varphi(x) + \varphi^c(y) \leq c(x,y)} }\int_X \varphi d\mu + \int_Y \varphi^c d\nu $$ where $$ \varphi^c(y) = \inf_x c(x,y) - \varphi(x) $$ for a reference you can see Santambrogio's Optimal Transport for Applied Mathematicians - Chapter 1.2.
In the case of $c(x,y) = d(x,y)$, this reduces to lipschitz functions since $\varphi^c = - \varphi$.