I am using a set of popular online notes on convex optimization. A function is $\beta$-smooth if its gradient is $\beta$-Lipschitz. Can anyone help me with the following line in the proof:
How does one justify combining the two terms:
$\int\limits_{0}^t \nabla f(y+t(x-y))^T(x-y)dt$ and $\nabla f(y)^T(x-y)$ together?
It seems to me you need to do the following:
$\int\limits_{0}^t \nabla f(y+t(x-y))^T(x-y)dt - \nabla f(y)^T(x-y)$ = $\int\limits_{0}^t \nabla f(y+t(x-y))^T(x-y)dt - \frac{1}{t}\int\limits_{0}^t \nabla f(y)^T(x-y) dt$ =
$\int\limits_{0}^t \nabla f(y+t(x-y))^T(x-y) - \frac{1}{t} \nabla f(y)^T(x-y) dt$
= $\int\limits_{0}^t (\nabla f(y+t(x-y)) - \frac{1}{t} \nabla f(y))^T(x-y) dt$
But no such $\frac{1}{t}$ term is used in the derivation. Can someone check?
(Assume that $dt$ in the integral is a dummy variable)

Note that the upper limit of the definite integral $\int_0^t \nabla f(y+t(x-y))^\intercal(x-y)\,\mathrm{d}t$ is fixed at $t = 1$, so that $\nabla f(y)^\intercal(x-y)$ enters into the definite integral $\int_0^1 \mathrm{d}t$ without any change. This gives \begin{split} & \left| \int_0^1 \nabla f(y+t(x-y))^\intercal (x-y) \,\mathrm{d}t - \nabla f(y)^\intercal (x-y) \right| \\ &= \left | \int_0^1 [ \nabla f(y+t(x-y)) - \nabla f(y) ]^\intercal (x-y) \,\mathrm{d}t \right| \\ &\le \int_0^1 | [ \nabla f(y+t(x-y)) - \nabla f(y) ]^\intercal (x-y) | \,\mathrm{d}t \\ &\le \int_0^1 \lVert \nabla f(y+t(x-y)) - \nabla f(y) \rVert \cdot \lVert x - y \rVert \,\mathrm{d}t \quad \mbox{(Cauchy--Schwartz)}\\ &\le \int_0^1 \beta \lVert t(x - y) \rVert \cdot \lVert x - y \rVert \,\mathrm{d}t \quad (f \text{ is } \beta \text{-smooth}) \\ &= \int_0^1 \beta t \lVert x - y \rVert^2 \,\mathrm{d}t \end{split}