From Boyd and Vandenberghe's Convex Optimization:
Book presents the following inequality with assumption that objective function is strongly convex.
$$ f(y) \geq f(x) + \nabla f(x)^T (y-x) + \frac{m}{2} \rVert y-x \lVert_{2}^{2} $$
Later the authors show that this inequality is used to bound $f(x) - p^{\star}$, where $p^{\star} = \inf_{x}f(x)$, the optimal value
They write:
"The RHS of the inequality is a convex quadratic function of $y$ (for fixed $x$). Setting the gradient with respect to $y$ equal to zero we find that...
$$ \hat{y} = x - (1/m) \nabla f(x) $$
minimizes the RHS... "
I do not understand how this optimal $\hat{y}$ is obtained. What confused me is the expression "setting the gradient wrt to $y$", I attempted to take the gradient of the whole inequality but I can't seem to make the algebra work out.
The gradient of the function $g(y) = \frac12 \|y - x\|_2^2$ is $\nabla g(y) = y-x$. The gradient of $h(y) = u^T(y-x)$ is $\nabla h(x) = u$. Then, the minimizer of the function $h(\cdot) + m g(y)$ is the solution of $$ \nabla h(y) + m\nabla g(y) = 0 \iff u + m(y-x) = 0 \iff y = x - \frac1m u. $$ He is setting the gradient of the RHS to zero, not the whole inequality.