Proof of convergence of binary gradient-projected stochastic gradient descent

61 Views Asked by At

I have an algorithm that I like to call stochastic binary projected-gradient descent, which looks like: $$ W_{k+1}=\Pi\left(W_k-\alpha_k \Pi\left(\nabla_{W_k} E\left(W_k\right)\right) \right)$$ where $E(\cdot)$ is an objective (loss) function, $\alpha_k$ is the step size and $\Pi(\cdot)$ is a projection onto the binary matrices, e.g. $\Pi:\mathbb{R}^{M\times N}\to\{-1,1\}^{M\times N}$ or equivalently $\Pi:\mathbb{R}^{M\times N}\to\{0,1\}^{M\times N}$, defined as the 'closest' binary matrix: $$ \Pi\left(\mathbf{g}\right) := \underset{{\mathbf{r} \in \{-1,1 \}^{n}}}{\arg\!\min} \|\mathbf{g} - \mathbf{r} \|_2^2 $$

Assuming that the gradients are stochastic, is algorithm tends to converge in practice and I am investigating a possible proof of its convergence. Note that, the outer projection in the first equation (the one on the left) matters less as its input is already quantized. Even a convergence proof of the deterministic version without that outer projection would suffice.

Also, I'm not sure if this helps, but another (stochastic) variant of this algorithm has been proven in a number of papers including SIGNSGD: Compressed Optimisation for Non-Convex Problems: $$ W_{k+1}= W_k-\alpha_k \mathrm{sign}\left(\nabla_{W_k} E\left(W_k\right)\right). $$