Boosting: confusion about functional gradient

32 Views Asked by At

In section 3 "A gradient descent view of voting methods" in the paper

https://proceedings.neurips.cc/paper/1999/file/96a93ba89a5b5c6c226e49b88973f46e-Paper.pdf

they define the inner product $$\langle F, G \rangle = \frac{1}{n} \sum_{i = 1}^{n} F(x_i) G(x_i)$$ and the functional $$C(F) = \frac{1}{n} \sum_{i = 1}^{n} c(y_i F(x_i))$$ They then compute $\nabla C(F)$ via $$\nabla C(F)(x) := \frac{C(F + \epsilon 1_x) - C(F)}{\epsilon} = \frac{1}{n} \sum_{i = 1}^{n} c'(y_i F(x_i)) y_i 1_x(x_i)$$ and then plug it into the definition of the inner product to obtain $$\langle f, \nabla C(F) \rangle = \frac{1}{n^2}\sum_{i = 1}^{n}c'(y_i F(x_i))y_i f(x_i)$$ I am confused because for small $\epsilon$, $$C(F + \epsilon f) - C(F) \approx \langle \epsilon f, \nabla C(F) \rangle = \frac{\epsilon}{n^2}\sum_{i = 1}^{n}c'(y_i F(x_i))y_i f(x_i)$$ but computing the difference directly we obtain $$C(F + \epsilon f) - C(F) \approx \frac{\epsilon}{n} \sum_{i = 1}^{n} c'(y_i F(x_i))y_i f(x_i)$$ So the first expression is divided by $n^2$ whereas the second expression is divided by $n$. Where am I going wrong?