$\text { Let }\left\{L_k\right\}_{k=1}^m \text { s be convex and } L \text { smooth } $, where $L$ smooth is defined to be the following:
$$ \left\|\nabla L_k(\boldsymbol{x})-\nabla L_k(\boldsymbol{y})\right\| \leq L\|\boldsymbol{x}-\boldsymbol{y}\| \forall \boldsymbol{x}, \boldsymbol{y} $$
Define $$ \ell(\boldsymbol{x})=\frac{1}{m} \sum_{k=1}^m L_k(\boldsymbol{x}) \text { and } \nabla \ell\left(\boldsymbol{x}_*\right)=\mathbf{0} $$
The claim is that $$ \frac{1}{2 L m} \sum_{k \in[m]}\left\|\nabla L_k(\boldsymbol{x})-\nabla L_k\left(\boldsymbol{x}_*\right)\right\|^2 \leq \ell(\boldsymbol{x})-\ell\left(\boldsymbol{x}_*\right) \quad \forall \boldsymbol{x} $$.
I first plug in the definition of $\ell$ into the claim, $1/m$ can be canceled, and it's suffice to prove the inequality of each summand. However, I couldn't find any theorem talks about the squared norm of the difference of the gradient is bounded by the difference of the function.
I tried to apply Lipschitz continuity and mean value theorem, but neither of them is helping with the situation that my upper bound is in terms of the difference of the function. (usually the upper bound is in terms of $x-x_*$).
I also tried to write the LHS squared norm as inner product, and expand that inner product. But that didn't go very far as well since I don't see how that can be related with the function on the RHS.
I admit this is a little tricky. Let's consider a summand. By Taylor's theorem, for any $y$, we have $$ L(y) \le L(x) + \nabla L(x)^T(y-x) + \frac{L}{2}\|y-x\|^2. $$ Now let $y = x - \nabla L(x) / L$, we get $$ L(x^*) \le L(x) - \frac{1}{L}\|\nabla L(x)\|^2 + \frac{L}{2L^2}\|\nabla L(x)\|^2 = L(x) - \frac{1}{2L}\|\nabla L(x)\|^2, $$ which gives the result given $\nabla L(x^*) = 0$. Note that this holds for non-convex functions as well.