How to Show that AdaBoost Weighted Error is Exactly 1/2

3.3k Views Asked by At

I am trying to prove that in an AdaBoost model $Y \rightarrow [-1,1]$

$err_t'= \frac{\sum_{i=1}^{N}w'_i1\{h_t(x^{(i)})\neq t^{(i)}\}}{\sum_{i=1}^{N}w'_i} = \frac{1}{2}$

here, $w_i' = w_i exp(-\alpha t^{(i)}h_t(x^{(i)})$ is the reweighted weight at step $t+1$, but $h_t$ is the classifier at step $t$. So in other words, trying to show that total error weighted with regards to updated (next step) weights is exactly 1/2. Also, this is assuming AdaBoost is using the usual procedure of setting $\alpha_t= \frac{1}{2}log(\frac{1-err_t}{err_t})$.

Here is what I have so far

break into two scenarios, error and not error

$err' = \frac{\sum_{i:h_t(x^{(i)})\neq t^{(i)}}w_i'*1}{\sum_{i=1}^{N}w'_i} + \frac{\sum_{i:h_t(x^{(i)}) = t^{(i)}}w_i'*0}{\sum_{i=1}^{N}w'_i} =$

second part is multiplied by zero so it goes away

$= \frac{\sum_{i:h_t(x^{(i)})\neq t^{(i)}}w_i'*1}{\sum_{i=1}^{N}w'_i} =$

plug in what we know about the new weight

$= \frac{\sum_{i:h_t(x^{(i)})\neq t^{(i)}}w_i exp(-\alpha_t(-1))}{\sum_{i=1}^{N}w'_i} =$

plug in what we know about $\alpha$

$= \frac{\sum_{i:h_t(x^{(i)})\neq t^{(i)}}w_i exp(\frac{1}{2}log(\frac{1-err_t}{err_t}))}{\sum_{i=1}^{N}w'_i} =$

log and exp() cancel out, and take error out of the sum as it is a constant w.r.t. $i$

$= \frac{\sum_{i:h_t(x^{(i)})\neq t^{(i)}}w_i }{\sum_{i=1}^{N}w'_i}*(\frac{1-err_t}{err_t})^\frac{1}{2} = ??$

I am stuck here..

1

There are 1 best solutions below

0
On BEST ANSWER

I figured it out. I am sharing this because I was looking for a solution online for many hours and mostly I found people discussing this fact or showing the derivation of $\alpha$, but I could not find a detailed proof. Here we go.

$\DeclareMathOperator{\err}{err}$

To prove:

$\err_t'= \frac{\sum_{i=1}^{N}w'_i1\{h_t(x^{(i)})\neq t^{(i)}\}}{\sum_{i=1}^{N}w'_i} = \frac{1}{2}$

Given:

$w_i' = w_i \exp(-\alpha t^{(i)}h_t(x^{(i)}))$

$\alpha_t= \frac{1}{2}\log(\frac{1-\err_t}{\err_t})$.

Proof:

Break into two scenarios, error and not error. To make things easier to write down define:

$E: h_t(x^{(i)})\neq t^{(i)};\\ E^c = C: h_t(x^{(i)}) = t^{(i)}; $

$\err' = \frac{\sum_{i \in E}w_i'*1}{\sum_{i=1}^{N}w'_i} + \frac{\sum_{i \in C}w_i'*0}{\sum_{i=1}^{N}w'_i} =$

Second part is multiplied by zero so it goes away:

$= \frac{\sum_{i \in E}w_i'*1}{\sum_{i=1}^{N}w'_i} =$

Plug in what we know about the new weight:

$= \frac{\sum_{i \in E}w_i \exp(-\alpha_t(-1))}{\sum_{i=1}^{N}w'_i} =$

Break denominator sum into E and C, move $\exp(\alpha)$ to denominator, substitute $w'_i$, simplify:

$= \frac{\sum_{i \in E}w_i}{\sum_{i = 1}^N \frac{w'_i}{\exp(\alpha)}} = \frac{\sum_{i \in E}w_i}{\sum_{i \in E} w_i\frac{\exp(\alpha)}{\exp(\alpha)}+\sum_{i \in C} w_i\frac{\exp(-\alpha)}{\exp(\alpha)}} = \frac{\sum_{i \in E}w_i}{\sum_{i \in E} w_i +\exp(-2\alpha)\sum_{i \in C} w_i}$

Now we take a break from this logical line and we point out two statements:

  1. $\alpha = \frac{1}{2}\log(\frac{1-\err_t}{\err_t}) \Rightarrow e^{2\alpha} = \frac{1-\err_t}{\err_t}$

  2. $\frac{\sum_{i \in E}w_i}{\sum_{i \in E}w_i+\sum_{i \in C}w_i} = \err_t \Rightarrow \sum_{i \in C}w_i = \frac{\sum_{i \in E}w_i}{\err_t} - \sum_{i \in E}w_i $

Going back to the logical path and plugging in the two statements above:

$\frac{\sum_{i \in E}w_i}{\sum_{i \in E}w_i + e^{-2\alpha}(\frac{\sum_{i \in E}w_i}{\err_t} - \sum_{i \in E}w_i)} = \frac{\sum_{i \in E}w_i}{\sum_{i \in E}w_i + \frac{\err_t}{1-\err_t}(\frac{\sum_{i \in E}w_i}{\err_t} - \sum_{i \in E}w_i)} = $

$ = \frac{\sum_{i \in E}w_i}{\sum_{i \in E}w_i + \frac{\sum_{i \in E}w_i}{1 - \err_t} - \frac{\err_t}{1-\err_t} \sum_{i \in E}w_i} = $

Notice that at this point $\sum_{i \in E}w_i$ simply cancels out:

$ = \frac{1}{1 + \frac{1}{1 - \err_t} - \frac{\err_t}{1-\err_t} } = \frac{1}{1 + \frac{1-\err_t}{1 - \err_t} } = \frac{1}{1 + 1 } = \frac{1}{2}$