Let $f: \mathbb R \rightarrow \mathbb R$ continuous with compact support $[0,1]$. Assume $|f(x)| \leq M$ for all $x \in [0,1]$. Let $\epsilon > 0$. Then let $0< \delta <1$ s.t. $$ \forall x,y \in \mathbb R: |x-y| < \delta \rightarrow |f(x)-f(y)| <\epsilon $$ The existence of such a $\delta > 0$ I have already proven. Now let $g:[-1,1] \rightarrow \mathbb R$ be an $(\delta,\epsilon)$-approximation to the identity. I want to show that for all $x \in [0,1]$ we have $$ |(f*g)(x)-f(x)| \leq \epsilon(3M +2\delta) $$ But I don't get the right bound. Here is what I did:
Let $x \in [0,1]$. Then
\begin{align} |(f*g)(x)-f(x)|& = \left | \int_{-1}^1 f(x-y)g(y) dy - \int_{-1}^1 f(x)g(y) dy \right| \\ & = \left | \int_{-1}^1 g(y)(f(x-y)-f(x)) dy \right | \\ &\leq \left | \int_{-\delta}^\delta (f(x-y)-f(x))g(y) dy \right | + \left | \int_{|x| \geq \delta} (f(x-y)-f(x))g(y) dy \right | \\ &\leq \int_{-\delta}^\delta \epsilon |g(y)| dy + \int_{|x| \geq \delta} 2M |g(y)| dy \\ &\leq \epsilon + 2M \int_{|x| \geq \delta} \epsilon dy \\ &\leq \epsilon + 2M \cdot 2(1-\delta) \epsilon \\ &= \epsilon(1+4M(1-\delta)) = \epsilon(1+4M - 4M\delta) \end{align} Where is the mistake ? I used in the first line that $\int_{[-1,1]} g = 1$ further the property given in the question and the $|x| \geq \delta \rightarrow |g(x)| \leq \epsilon$.
I'm not surprised that you don't manage to show
$$\lvert (f \ast g)(x) - f(x)\rvert \leqslant \epsilon(3M + 2\delta)$$
under the given hypotheses.
Because in general, it does not hold.
If such an estimate would hold for all $(\epsilon,\delta)$-approximations of the identity, it would, by taking limits, also hold for the Dirac measures located in $z \in [-\delta,\, \delta]$, and that would mean you'd have
$$\sup_{\lvert y\rvert \leqslant \delta} \lvert f(x-y) - f(x)\rvert \leqslant \epsilon(3M+2\delta).$$
When $\delta$ is chosen as large as possible for the given $\epsilon$, the left hand side becomes $\epsilon$ for some $x$, and that would imply $3M + 2\delta \geqslant 1$. But we can easily construct examples where that does not hold.
If, in your computation, you use the fact that $f(x-y) = 0$ when $x \leqslant y \leqslant 1$ or $-1 \leqslant y \leqslant x-1$, so on these intervals you can estimate $\lvert f(x-y)-f(x)\rvert$ by $M$ instead of $2M$, you get an estimate
$$\lvert (f\ast g)(x) - f(x)\rvert \leqslant \epsilon(1+3M - 2M\delta),$$
which is closer to the stated goal.