I was trying to better understand the intuition behind Probability of Improvement (PI) acquisition function in the field of Bayesian Optimization
This acquisition function chooses the next point as the one with highest probability of improvement over the current max $f(x^+)$, and we can write the selection of next point as follows:
$$x_{t+1} = arg\max (\alpha_{PI}(x)) = arg \max((P(f(x) \ge f(x^+)+\epsilon)))$$
where $x^+$ is the current optimum point.
Typically, having an initial dataset of available points and evaluations $\mathcal{D} = \{x_i, f(x_i)\}_{i=1}^n$ one uses a Gaussian Process regression to build a surrogate model for the black-box function $f$. Given this consideration, it is possible to rewrite the original equation as
$$x_{t+1} = arg\max_{x} \Phi\left(\frac{\mu_t(x) - f(x^+)-\epsilon}{\sigma_t(x)}\right)$$
with $\Phi(\cdot)$ indicating the CDF.
To me it's not yet clear how to get to the second equation by starting from the first one.
You can get the result by a change of variables. Let's fix some notation.
At time $t+1$ you have observed $(x_1, y_1), \dots, (x_t, y_t)$ and you can build the functions $\mu_t$ and $\sigma_t$ (mean and variance) based on this data. You can also define $x^+=\text{arg}\max_{i=1,\dots, t}f(x_i)$. Notice that if there is no noise in your observations (which I believe to be your case), then $x^+$ does not depend on the chosen model $f$, as it coincides with an observed $y_i$, for some $i=1, \dots, t$.
Since the right hand side of $f(x) \geq f(x^+) + \varepsilon$ is fixed (not a random variable) $$\begin{aligned}\mathbb{P}(f(x) \geq f(x^+) + \varepsilon) &= \frac{1}{\sqrt{2 \pi\sigma_t(x)^2}} \int^{\infty}_{f(x^+) + \varepsilon}\exp\left (-\frac{(r-\mu_t(x))^2}{2\sigma_t(x)^2}\right)dr\\ &= \frac{1}{\sqrt{2 \pi}} \int^{\frac{\mu_t(x))^2 -f(x^+) - \varepsilon}{2\sigma_t(x)^2}}_{-\infty}\exp\left (-\frac{s^2}{2}\right)ds\end{aligned},$$ where the change of variables is $s= -\frac{r - \mu_t(x)}{\sigma_t(x)}$. The rightmost element in the chain of equalities is the cdf of the standard Gaussian evaluated at $\frac{\mu_t(x))^2 -f(x^+) - \varepsilon}{2\sigma_t(x)^2}.$