A question on the projection step of Generic Adaptive Method: $x_{t+1} = \Pi_{\mathcal{F},\sqrt{V_t}} (\hat{x}_{t+1}).$

17 Views Asked by At

I am reading the paper "ON THE CONVERGENCE OF ADAM AND BEYOND". In this paper, they proposed the following framework of adaptive methods. enter image description here

I was confused on the last step: $x_{t+1} = \Pi_{\mathcal{F},\sqrt{V_t}} (\hat{x}_{t+1}).$

The definition of the operation is the following:

For $A\in \mathcal{S}^+_{d} \text{ (the set of positive definite $d\times d$ matrices)},$ $\Pi_{\mathcal{F},A} (\hat{x}_{t+1}) = \arg\min_{x\in \mathcal{F}} \|A^{1/2}(x-y)\| \text{ for $y \in \mathbb{R}^d$}.$

My understanding is this is a projection and it projects the $\hat{x}_{t+1}$ onto the feasible set. But I don't understand why do we need the $A$ or $\sqrt{V_t}.$

Can someone help me understand it? Thank you in advance.