Bayesian optimisation basics (illustrative example)

90 Views Asked by At

I'm trying to follow a basic example of Bayesian optimisation described in the paper by Močkus, J. "On Bayesian methods for seeking the extremum." (Optimization Techniques IFIP Technical Conference. Springer Berlin Heidelberg, 1975, page 401). A better formatted text is available here (see page 8).


We want to minimise the following function in example 2.2.1 in the book: $f(x) = (x - \omega)^2$, where $x \in A, A = [-1, 1]$; $\omega \in \Omega, \Omega=[-1, 1]$. In general, $A$ is a compact set and $\Omega$ is a set of indices $\omega$ corresponding to all continuous functions of $x \in A$. $N=1$, that is, we have a single observation. The a priori density function is $p(\omega) = 1/2$, $\omega \in \Omega$.

To proceed with the recurrent equations (2.2.1 in the book):

\begin{align} u_1(z_1) = \inf_{x\in A}E\{f(x)\mid z_1\}\\ u_0 = \inf_{x \in A}\{u(x, f(x))\} \end{align} And here is the next step which I have troubles with: \begin{align} E\{f(x) \mid z_1\} = \begin{cases} (x - \omega_1)^2, & \omega_1 \in \Omega, \omega_2 \in^- \Omega\\ (x - \omega_2)^2, & \omega_1 \in^- \Omega, \omega_2 \in \Omega\\ 1/2(x-\omega_1)^2 + 1/2(x-\omega_2)^2, & \omega_1 \in \Omega, \omega_2 \in \Omega. \end{cases} \end{align} , where $z_1 = (f(x_1), x_1)$ is our observation.


1) I have difficulties understanding why we have three cases. Perhaps, the hint lies within the notation $\in^-$ I've never encountered before. Does it mean "negative element of"?

2) To me, the notation $[a, b]$ is an interval from $a$ to $b$. How come $p(\omega)=1/2$ for $\omega \in [-1, 1]$? Surely, it should be a set $\{-1, 1\}$, shouldn't it? Yet, it's still in $[a, b]$ in both manuscripts.

Any hints would be appreciated.

1

There are 1 best solutions below

0
On

Starting afresh sometimes helps. I'll write down my interpretation of the example and would be really grateful to anyone who could glance through it if it makes sense.


We first note that

  1. $\in^-$ means $\not\in$;
  2. $[a, b]$ is the usual interval $\{x: a\leq x \leq b \}$. Since, $\int_{-1}^{1} \frac{1}{2}dx = 1$, $p(\omega):=\frac{1}{2}$ is a valid density function.

So, for our first observation $z_1 = (f(x_1), x_1)$, we have two possibilities for $\omega$, namely $\omega_{1,2} = x_1 \pm \sqrt{f(x_1)}$. We have three cases for the $E\{f(x_1) \mid z_1 \}$: a) $\omega_1 \in \Omega$; b) $\omega_2 \in \Omega$; or c) both.

Following the rest of the example from the book is quite straightforward. We arrive at $E\{u_1(x, f(x))\} = \int_{\Omega_x} (x-\omega)^2 d\omega$, where $\Omega_x = [max(2x-1, -1), min(2x+1, 1)]$, that looks like: enter image description here

Hence, to minimise it we take $x_1 := \pm 1$ as out first observation. In the second round we choose: \begin{equation} x_2 := \begin{cases} 1 - \sqrt{f(1)}, & x_1 = 1 \\ -1+\sqrt{f(-1)}, & x_1 = -1 \end{cases} \end{equation}