There are many theoretical results known on convergence rates for various (possibly stochastic) convex optimization problems. For example, the popular review on optimization algorithms for machine learning [https://arxiv.org/abs/1405.4980] gives many such rigorous results.
For example, say we have a constraint set with diameter $R$, and we have stochastic access to an objective function with gradient estimates upper bounded by $G$. Then, one can rigorously prove that projected SGD, with proper step sizes, converges at a rate $O(RG/\sqrt{T})$.
I am wondering how relevant theoretical results such as this are in practice, where relevant parameters may be unknown. In the above example, the step sizes of SGD must be chosen wisely (as functions of $R$ and $G$) to attain the provable convergence rate. But, say that my optimization problem is actually complicated and nonconvex, and I run some SGD algorithm on it. Say that I start SGD in some convex region containing a local minimum, at distance $R$ from the true optimum. However, my algorithm doesn't know that it is distance $R$ from the optimum. Also, I might not a priori know a bound $G$ on the gradient estimates I receive. In practice, can I still run some SGD algorithm and converge to the optimum at a rate $O(RG/\sqrt{T})$?
The above paragraph is a specific example, but I'm just generally curious about how well provable converge rates carry over in practice to complicated nonconvex optimization problems (where, say, one is satisfied to converge to any local minimum to high precision).