Does something "magical" happen at 20 Dimensions?

57 Views Asked by At

In the context of Bayesian Optimization, I have often heard that Bayesian Optimization tends to perform poorly on functions having more than 20 dimensions. However, I have never been able to understand (from a mathematical perspective) why these problems are said to manifest themselves precisely at the 20 dimension mark (e.g. why not 15, 25, etc.)

For example:

1) "High-Dimensional Bayesian Optimization with Sparse Axis-Aligned Subspaces" (Eriksson et al 2021)

"High-dimensional BO presents a particular challenge, in part because the curse of dimensionality makes it difficult to define—as well as do inference over—a suitable class of surrogate models." "While BO has become a workhorse algorithm that is employed in a wide variety of settings, successful applications are often limited to low-dimensional problems, e.g. fewer than twenty dimensions [Frazier, 2018]. Applying BO to high-dimensional problems remains a significant challenge. The difficulty can be traced to both of the algorithm components mentioned above, although we postulate that suitable function priors are especially important for good performance. In particular, in order for BO to be sample-efficient in high-dimensional spaces, it is crucial to define surrogate models that are sufficiently parsimonious that they can be inferred from a small number of query points. An overly flexible class of models is likely to suffer from overfitting, which severely limits its effectiveness in decision-making. Likewise, an overly rigid class of models is unlikely to capture enough features of the objective function. A compromise between flexibility and parsimony is essential."

2) "High-dimensional Bayesian optimization using low-dimensional feature spaces" (Moriconi et al, 2020)

"However, BO (Bayesian Optimization) is practically limited to optimizing 10–20 parameters. To scale BO to high dimensions, we usually make structural assumptions on the decomposition of the objective and/or exploit the intrinsic lower dimensionality of the problem, e.g. by using linear projections. We could achieve a higher compression rate with nonlinear projections, but learning these nonlinear embeddings typically requires much data. This contradicts the BO objective of a relatively small evaluation budget."

3) "A Tutorial on Bayesian Optimization" (Frazier, 2018)

"It (Bayesian Optimization) is best-suited for optimization over continuous domains of less than 20" "The input x is in R-d for a value of d that is not too large. Typically d ≤ 20 in most successful applications of BayesOpt."

I can understand the general argument : In general, functions with more dimensions tend to be more challenging to optimize compared to functions with fewer dimensions. For example, functions with more dimensions can have undesirable properties (with regards to optimization) such as Saddle Points and Local Minimums with a higher frequency compared to functions in lower dimensions.

My Question: But why exactly does Bayesian Optimization tend to "suffer" on functions having more than 20 dimensions? Do we have reasons to believe that other optimization algorithms (e.g. Gradient Descent) would "equally suffer" on such functions?

Thanks!

References: