No Free Lunch Theorem 1 Proof

205 Views Asked by At

I was reading the original "No Free Lunch Theorems for Optimization" paper and I'm trying to understand a specific step in the inductive step of the proof of theorem 1. To provide some background, theorem 1 states

$$\sum_f P(d_m^y|f, m, a_1) = \sum_f P(d_m^y|f, m, a_2)$$

where

  • $f$ is a cost function that maps the input space $\mathcal{X}$ to the cost space $\mathcal{Y}$ (both spaces are finite)
  • $m$ is the number of points visited by the algorithm $a_i$ producing a sequence of input and cost values $d^x_m$ and $d^y_m$ respectively
  • $d_m$ is the ordered set of pairs $\{(d^x_m(1), d^y_m(1)), ..., (d^x_m(m), d^y_m(m))\}$
  • $P(d_m^y|f, m, a_i)$ is the conditional probability of producing the sequence of cost values $d_m^y$ with cost function $f$ and algorithm $a_i$ by visiting $m$ points.

In the final steps of the proof (see Appendix A in the paper for the rest of the proof), we say

$$\sum_f P(d^y_{m+1}|f, m+1, a) = |\mathcal{Y}|^{|\mathcal{X}| - m - 1} \sum_{f(x \in d_m^x), d_m^x} P(d_m|f, m, a)$$ $$= \frac{1}{|\mathcal{Y}|} \sum_{f, d_m^x} P(d_m|f, m, a)$$ $$= \frac{1}{|\mathcal{Y}|} \sum_f P(d_m^y|f, m, a)$$

I don't understand how the authors turn the first equivalence into the second by replacing the $|\mathcal{Y}|^{|\mathcal{X}| - m - 1}$ factor with $\frac{1}{|\mathcal{Y}|}$ and changing the sum to sum over all $f$, rather than just the $f$ defined on $x \in d^x_m$.

1

There are 1 best solutions below

0
On BEST ANSWER

As $P(d_m^y\vert f,m,a)$ depends solely on $\vert \mathcal{Y}\vert^m$ possibilities for $f$ on $d_m$,

\begin{align} \sum_{f(x\in d^x_m),d_m^x} P(d_m\vert f,m,a)&=\frac{1}{\vert \mathcal{Y}\vert^{\vert \mathcal{X}\vert-m}}\sum_{f(x\in d^x_m),d_m^x} \vert \mathcal{Y}\vert^{\vert \mathcal{X}\vert-m}P(d_m\vert f,m,a)\\ &=\frac{1}{\vert \mathcal{Y}\vert^{\vert \mathcal{X}\vert-m}}\sum_{f(x\in d^x_m),d_m^x} \sum_{f(x\notin d_m^x)}P(d_m\vert f,m,a)\\ &=\frac{1}{\vert \mathcal{Y}\vert^{\vert \mathcal{X}\vert-m}}\sum_{f,d_m^x}P(d_m\vert f,m,a). \end{align}

The inner sum in the second line follows because it does not depend on the $\vert \mathcal{Y}\vert^{\vert \mathcal{X}\vert-m}$ possibilities for $f$ outside of $d_m^x$, and the last line is because any function $f$ is completely specified by the values on $d_m^x$ and outside $d_m^x$.