In the Wasserstein GAN paper (https://arxiv.org/abs/1701.07875), weight clipping is used to let the weights $w$ lie in a compact space $\mathcal{W}$ so that the parameterised discriminator function $f_w$ can be K-Lipschitz. I've seen this statement in many places, which seem to rephrase or take this statement verbatim, but I don't know which exact theorems or concepts lead to this conclusion.
Is the definition of a compact space in the paper also the definition for compact metric space? There seems to be many definitions for compactness.
Intuitively, I can imagine the lipschitzness of $f_w$ depends on the product of lipschitz norm of the layer weights, i.e. $\vert\vert f_w \vert\vert_{lip} \leq \prod_i \vert\vert w_i \vert\vert_{lip}$. However, I am missing a link from relating the lipschitz norm of weights to compactness. Where is this missing link and which theory justifies it?
The statement I'm referring to:
