I'm studying about information criteria and came across with the following proof in my reference material (page 34):
On the functional form of Kullback-Leibler information
If the differentiable function $F$ defined on $(0, \infty)$ satisfies the relationship
$$\sum_{i=1}^k g_iF(f_i) \leq \sum_{i=1}^k g_iF(g_i)$$
for any two probability functions $\{g_1, ..., g_k\}$ and $\{f_1, ..., f_k\}$, then $F(g)=\alpha + \beta\log g$ for some $\alpha, \beta$ with $\beta>0$.
Proof: In order to demonstrate that $F (g) = \alpha + \beta \log g$, it suffices to show that $gF'(g)=\beta>0$ and hence that $\partial F / \partial g =\beta/g$
Let $h = (h_1, ..., h_k)^T$ be an arbitrary vector that satisfies $\sum_{i=1}^k h_i=0$ and $|h_i| \leq \max\{g_i , 1 − g_i\}$. Since $g+\lambda h$ is a probability distribution, it follows from the assumption that
$$\phi(\lambda)\equiv \sum_{i=1}^k g_iF(g_i+\lambda h_i) \leq \sum_{i=1}^k g_iF(g_i)=\phi(0).$$
Therefore, since
$$\phi'(\lambda)= \sum_{i=1}^k g_iF'(g_i+\lambda h_i)h_i,\;\;\;\;{\color{red}{\phi'(0)= \sum_{i=1}^k g_iF'(g_i)h_i=0}}$$
are always true, by writing $h_1=C, h_2=-C, h_i=0\;(i=3, ..., k)$, we have
$$g_1F'(g_1)=g_2F'(g_2)=\text{const}=\beta.$$
The equality for other values of $i$ can be shown in a similar manner Q.E.D.
QUESTION 1: What happened starting from the equation I marked as red? Why is this equation equal to zero?
QUESTION 2: It was not given in my material but I assume the values $\lambda$ and $C$ are constants?
In the red equation it was used the fact, written above, that for all $\lambda$ (at least the ones for which the function is well defined) $$ \phi(\lambda) \leq \phi(0)$$ This means that $\phi$ attains maximum at $0$ and therefore its derivative at $0$ must be $0$.
To answer the second question: no, $\lambda$ is not a constant. Once the vector $h$ is fixed, we define the function $$\lambda\mapsto \sum_{i=1}^k g_iF(g_i+\lambda h_i)$$ Observe that, since we required $h$ to satisfy the properties above, for all $\lambda$ sufficiently small $(g_i+\lambda h_i)$ is a probability function (at least for $\vert\lambda\vert<1$, otherwise there is the risk that for some $i$ $g_i+\lambda h_i<0$, and in that case we cannot evaluate it in $F$). So we can consider $\lambda$ to be a variable in $(-1,1)$.
Now all of this reasoning was done for a fixed suitable vector $h$ (i.e. a different choice of $h$ would determine a different function $\phi$). Now $C$ is a constant (which again has to be sufficiently small for everything to be well defined) that we choose, and that determines $h$ of the form $(C,-C,0,\ldots,0)$. What they do then is to plug in the equation $$\sum_{i=1}^k g_iF'(g_i)h_i =0$$ the vector $h=(C,-C,0,\ldots,0)$ which gives $$Cg_1 F'(g_1) - Cg_2 F'(g_2)=0$$ and the conclusion follows. I hope this clarifies it to you.