What does "with probability $1 - \delta$" mean in optimization theorems for algorithms?

733 Views Asked by At

I´m reading a paper about second order stochastic optimization. In many of their theories it states that the convergence rate or other inequalities hold "with probability $1 - \delta$" or "with probability at least $1 - \delta$.

For example one theorem states: Algorithm 1 returns a point $x_t$ such that with probability at least $1 - \delta$: $f(x_t) \leq f(x^{*}) + \epsilon$.

How do I show that it holds "with probability at least $1 - \delta$" ? Do I have to show that $\Pr\left(f(x_t) \leq f(x^{*}) + \epsilon\right) \geq 1 - \delta$ ? How can I do that exactly?

Or is there similar material where I can look up on how to show this?

For more context if needed, the above theorem is given on page 11 here: https://arxiv.org/pdf/1602.03943.pdf.

Any answer is appreciated!

1

There are 1 best solutions below

8
On

The algorithms are described as "stochastic". This means that there will be inherent randomness in the algorithm. As such, it's possible that the algorithm will fail, but this probability is at most $\delta$. So yes, you need to show exactly the probability inequality that you have stated. Without reading the paper more thoroughly I can't say where this probability will arrive.


Edit As pointed out by Clement C. in the comments below, the randomness comes the middle step of the algorithm, which states "Sample $\tilde{\nabla}^2 f_{[i,j]}(\mathbf{x}_t)$ uniformly from [...]". See in particular Lemma 3.6 and its proof, where it explicitly states $$ \Pr[ \| \tilde{\nabla}^{-2} f(\mathbf{x}_t) - \nabla^{-2} f(\mathbf{x}_t)\| \leq \gamma ]\geq 1- \delta. $$ (This information from this edit is given by Clement C.'s comments, but I am collecting it here for readability.)