Why can I write this function in terms of its previous values?

227 Views Asked by At

From Understanding Machine Learning: From Theory to Algorithms

In the red box below, why is $F(\theta ' ) = F(\theta ) -D_i 1_{[y_i = 1]} + D_i 1_{[y_1=-1]}$?


enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

As discussed in comments, let's define $F(\theta)$ by

$$ F(\theta) = \sum_{i:y_i=-1} D_i \mathbb{1}_{[x_{i,j}\leq\theta]} + \sum_{i:y_i=1} D_i \mathbb{1}_{[x_{i,j}>\theta]}.$$

I swapped the left and right sums here because it helped me understand the formula better. But addition is commutative, so it's the same function that we find inside the parentheses in $(10.1).$

As far as I can tell, what is going on here is that we have (temporarily) assigned a fixed value to $j$ (because we are in one iteration of the loop over $j$), and then for each of the (sorted) values $x_{1,j}, \ldots, x_{i,j}, \ldots, x_{m,j}$ we compute a value $y_i,$ which is either $1$ or $-1.$

I visualize this as $m$ points at coordinates $x_{1,j}, \ldots, x_{m,j}$ arranged in that order from left to right along the number line (because they have been sorted), with each of these $m$ points labeled either $y_i=1$ or $y_i=-1.$

So $F(\theta)$ is a weighted count (weighted by $D_i$) of all the times the label $y_i = -1$ occurs to the left of $\theta$ (or exactly at $\theta$) and all the times the label $y_i = 1$ occurs to the right of $\theta.$

Since $\theta \in (x_{i-1,j},x_{i,j})$ and $\theta \in (x_{i,j},x_{i+1,j}),$ we know that $\theta < x_{i,j} < \theta',$ and we also know that $x_{i,j}$ is the only one of the "$x$" values between $\theta$ and $\theta'.$ So when we go from $F(\theta)$ to $F(\theta'),$ one of two things happens as we pass over $x_{i,j}$:

  • If $y_i = -1,$ we get one more label of this kind on the left side of $\theta'$ than on the left of $\theta.$ Therefore the function value increases by $D_i,$ which is the weight at that point.

  • If $y_i = 1,$ we get one fewer label of this kind on the right side of $\theta'$ than on the right of $\theta.$ Therefore the function value decreases by $D_i,$ which is the weight at that point.

The notation $ - D_i \mathbb{1}_{[y_i=1]} + D_i \mathbb{1}_{[y_i=-1]}$ says the same thing: if $y_i=-1$ we add $D_i$, but if $y_i=1$ we subtract $D_i.$

If you work out the value of $-D_i y_i$ in both cases, it turns out to be the amount by which the function value changes as we go from $\theta$ to $\theta'.$