Consider a smooth function $f$ on $[0,1]$ and its Bernstein polynomial of power $n$: $$B_n(f)=\sum_{k=0}^n f\left(\frac{k}{n}\right) b_{n,k}(x)$$ with $$b_{n,k}(x) = \binom{n}{k}x^k (1-x)^{n-k}.$$
It is well known that $$\sup_{x \in[0,1]} |B_n(f)(x)-f(x)|=O\left(\frac{1}{\sqrt{n}}\right).$$
But what if one considers the $L^2$ norm instead of $L^{\infty}$ norm? Will the approximation power be better in $L^2$ norm? In particular, is it possible to conclude that $$\left(\int_{0}^1 (B_n(f)(x)-f(x))^2 dx \right)^{\frac{1}{2}}=O\left(\frac{1}{n}\right)?$$
Page 206 of the book Bernstein Operators and Their Properties states for $f\in C^2[0,1]$: $$B_n(f,x)-f(x)=\frac{x(1-x)}{2n}f''(x)+o\left(\frac{1}{n}\right)$$
and cites