Adam optimizer notation

335 Views Asked by At

In the paper ADAM they explain how the optimizer work in algorithm 1. In the last step of the while loop they update the parameters with

$$ \theta_{t+1} = \theta_{t} + \alpha \frac{\hat{m}_t}{(\sqrt{\hat{v}}_t - \epsilon)}$$

where $\alpha \in \mathbb{R}$ and I suppose $\theta, \hat{m}_t, \hat{v}_t,\epsilon \in \mathbb{R}^n$ for some $n$. They do not specify the calculation procedure for $\frac{\hat{m}_t}{(\sqrt{\hat{v}}_t - \epsilon)}$. I suppose everything is done element wise but I'm not sure since they specify how they calculate the square gradient by the element wise operation but not the last step.

1

There are 1 best solutions below

1
On BEST ANSWER

You are correct, at the top of Algorithm 1 they mention that "All operations on vectors are element-wise".