It is well known that for appropriate step size, $E[g^t] \in \partial f(x^t)$ is sufficient conditions for this subgradient algorithm to converge. What I'm wondering is whether the requirement has to hold conditioned on previous subgradients?
Say we're alternating between $k$ directions randomly (without replacement) some of which are not subgradients, but on average they are. Then $E[g^t]$ is always a subgradient, but not necessarily $E[g^t|g^{t-1}]$.
Turns out I ran into an open problem. Theoretical properties of these kind of schemes are very poorly understood, but seem to be superior in practice, see arxiv.org/pdf/1202.4184.pdf and http://www.optimization-online.org/DB_FILE/2014/12/4679.pdf for instance.
Violating independence wrecks a bunch of assumptions needed to prove convergence, although this doesn't seem to matter much in practice.