Why does the sup norm make the results of approximation theory independent from the unknown distribution of the input data?

549 Views Asked by At

I was reading the paper "Why and When Can Deep – but Not Shallow – Networks Avoid the Curse of Dimensionality: a Review" and I was trying to understand the following statement in section 3.1:

On the other hand, our main results on compositionality require the sup norm in order to be independent from the unknown distribution of the inputa data. This is important for machine learning.

my questions are:

  1. what does it mean that the sup norm $\| f \|_{\infty} = \sup_{x \in X} |f(x)|$ make the results of the respective paper independent of the input distribution?
  2. additionally, why does the sup norm $\| f \|_{\infty} = \sup_{x \in X} |f(x)|$ make the results of the respective paper independent of the input distribution?
  3. Why is that important for machine learning?

I don't know the answers, maybe because of my lack of experience in functional analysis and approximation theory but my guesses what the answer might be are:

  1. I think what it means is that since the paper is concerned with proving bounds on the smallest distance between a target function and a space of functions (space of Neural Networks) denoted by the degree of approximation $dist(f,V_N) = \inf_{P \in V_N} \|f - P \|_{\infty}$, then what I assume it claims is that upper bounds on this quantity are independent (not a function of) the probability distribution of the input space $X$ where $f:X \to Y$. Does it matter because it means it applies to any distribution of $X$? I guess the reason I find this confusing is that I don't particularly see an issue with it dependent on the data distribution. I think what matters more is that the bound on the degree of approximation is not vacuous.i i.e. that its not infinity. If it is infinity for some distribution then the results are useless. However, I don't see why independence on the distribution would matter, I'd assume that boundedness or compactness rather than distribution is what matters (since its what makes things not explode).
  2. I don't understand why the sup norm make things not explode. The reason things should not explode should be due to boundedness or compactness, not anything to do with the sup norm. I guess obviously the sup norm implies things don't explode if its bounded, but that happens because of a apriori boundedness/compactness assumption, not due to the sup norm, right?
  3. I guess its important for machine learning because they care things hold for any probability/data distribution. But as I've said before, I don't understand how talking about data distribution matters. In my opinion it doesn't matter, since thats not what makes things explode, what matters is the boundedness/compactness of $X$, $f(x)$ and $Y$

Is this on the right track? Or am I misunderstanding the paper a lot?