I am not sure this is the best place to ask this kind of question about discussing the content of a paper. Anyway, here is my question:
There is a famous paper from Physical Review E 69, 066138(2004) called "Estimating Mutual Information" (http://arxiv.org/pdf/cond-mat/0305641.pdf).
My question is why the author use the maximum norm(instead the $L_2$ norm) for combining $X$ and $Y$?
I think most people will have this question if they have ever read this paper. Thank you!