Consider the problem of minimizing $f(x)+g(x)$ over $\mathbb{R}^n$, both are smooth, and the following first order method: $$ x_{k+1}= \operatorname{argmin}_x\left \{ \nabla f(x_k)^T(x-x_k) + \frac{1}{\alpha^{p-1} p} \|x-(x_k - \alpha \nabla g(x_k))\|_p^p \right \} $$
Is it a special case of some known method?
What about the stochastic setting, when $f(x) = \frac{1}{m} \sum_{i=1}^m f_i(x)$, replacing $\nabla f$ by $\nabla f_i$, where $i$ is chosen at random?
Note that it is easy to see that for $p=2$, we obtain exactly the (stochastic) gradient method with step-size $\alpha$.
The method was coded by mistake, instead of a more natural method which we aimed to test, $$ x_{k+1}= \operatorname{argmin}_x \left\{ [\nabla f(x_k) + \nabla g(x_k)]^T(x-x_k) + \frac{1}{\alpha^{p-1} p} \|x-x_k\|_p^p \right \} $$ with $p=3$. The stochastic version of the 'mistake' turned out to perform much better than the stochastic version of the more natural method above.