How to prove total test error is independent of the selected learning algorithm

Question

How to prove total test error is independent of the selected learning algorithm

65 Views Asked by Bumbble Comm At 09 Apr 2026 - 3:57

I'm looking at the following proof:

Where:

Note: I'm new to this but I think I understand all the below variables correctly now. Mistakes are possible though.

$f$ is an ideal function with perfect results with respect to some sample set
$\sum_{h}$ is the hypothesis space
$\sum_{x \in \mathcal{X}-X}$ is the sum of the entire sample space except the training data
$P(x)$ is the probability distribution over the sample space
The binary function in the middle is a selection function for any instance where a hypothesis is selected that is not an ideal function
$P(h | X, L_a)$ is the probability distribution of selecting hypothesis $h$ given some function/learned model $L_a)$ used on training data $X$

The author says that given some learning function $X -> \{0,1\}$ it's function space would be $\{0,1\}^{|x|}$. He furthermore says

if $f$ is uniformly distributed, half of the predictions of $f$ on $X$ are not consistent with $h(X)$

I do not understand what is meant by this nor how he arrived at that conclusion. Why would it be the case that if $f$ is uniformly distributed that half of its predictions are not consistent with $h(X)$?

After showing this proof he then goes on to say that test error, $E_{ote}$, is independent of the chosen algorithm ($L_a$)

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2021-05-02 13:20:29

Thanks to @Joe for suggesting the resources to answer this. The below two pictures from "Learning from Data - A Short Course" (which I would suggest) are what made this problem clear to me.

Deriving the 1/2

In the picture $g$ is the selected hypothesis. Let's say that $g$ matches $f_1$. Now we calculate how much disagreement there is with $g$ and all the other $f$s.

$f_1$ = 0 (since this is the one we picked)
$f_2$ = 1
$f_3$ = 1
$f_4$ = 2
$f_5$ = 1
$f_6$ = 2
$f_7$ = 2
$f_8$ = 3

The average disagreement then is the sum of $(0+1+1+2+1+2+2+3)/8=1.5$. There are 3 points outside of the training set so in this case our hypothesis disagrees with exactly $1.5/3=1/2$ of the other possible hypothesis. Now to my original question, I asked why does it matter that $f$ is uniformly distributed. Notice that the only reason the above works is that every $f$ completely disagrees with every other $f$ outside the training data. If there were agreement between some of the $f$s the math would no longer hold up.

Deriving the ^2

This is really less of a derivation and more just requires some logical thought about the nature of binary. Look at the chart of the various $f$s mapped. Notice again, assuming that all the $f$s completely disagree with each other, there are exactly $2^n$ which agree with the training data. Think about it logically - if all $f$s must disagree and you only have 3 bits, then there are only 8 possibilities. It does not matter what the training data is, the points outside the training data only have 8 possibilities before they would start to overlap. This scales to any number of points. $n$ could be a billion and this would still be true.

How to prove total test error is independent of the selected learning algorithm

There are 1 best solutions below

Deriving the 1/2

Deriving the ^2

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions