Can anyone give an informative example of two distributions which have a low Wasserstein distance but high relative entropy (or the other way around)? I find the Wasserstein, defined (for some $p$) as
$$ W_p(\mu,\nu)=\left(\inf_{\pi\in\Pi(\mu,\nu)}\int d^p(x,y)\pi(dx,dy)\right)^{1/p},$$
an intuitive, reasonable way to calculate the distance (or displacement) between two probability measures.
However I'm struggling to see what relative entropy really tells us. I have seen from Sanov's theorem that it can be used to control the exponential rate of decay of the probability of a rare event, however I still haven't got an intuitive feel for how it works and would really appreciate a concrete example so I can compare it against the Wasserstein. I have heard that relative entropy controls the fluctuation of one distribution w.r.t another, however, I haven't yet quite understood what this means.
For an example, look at the point masses $\delta_0$ and $\delta_h$ supported at $0$ and $h$, respectively. The Wasserstein distance between these is $O(h)$, which is small if $h$ is small. But for $h\ne0$ the relative entropy is infinitely large, as the two measures are mutually singular.
For an example the other way around, let $f(x)$ be the density of a $U[0,N]$ rv, and let $$g(x)=(1-\epsilon)f(x)$$ for $x\in[0,N/2]$ and $$g(x)=(1+\epsilon)f(x)$$ otherwise.
The Wasserstein distance is something like $O(N\epsilon)$ (because we have to transfer like $\epsilon$ of the mass over distance $N/2$, but the relative entropy is something like $O(\epsilon)$ because $\log f(x)/g(x) = O(\epsilon)$. By proper choice of $\epsilon$, we can make the Wasserstein distance big but the relative entropy small.
The intuitive picture I have in mind, is that when one looks at the superimposed graphs of the densities of the two measures (pretending that they have densities) is that the relative entropy measures how much they differ in a vertical sense only, but the Wasserstein metric allows for sideways nudgings of the two graphs.