I am perplexed by the fact that many sources online such as this website refer to mirror descent as a non-Euclidean gradient descent.
Why non-Euclidean?
- The constraint set $\mathcal{C} \subset \mathbb{R}^n$ is perfectly Euclidean.
- The vectors $x \in \mathbb{R}^n, g_k \in \mathbb{R}^n$ both live in the Euclidean space.
- $D_h(x,x_k)$ is not a Euclidean norm, in fact, not any norm, not any metric either, even if we refer to it as a "distance", it is a distance on what set/space? I would not refer to $D_h$ as a non-Euclidean distance, it is not a distance at all.
I expected that when talking about "non-Euclidean gradient descent", we meant that we will equip a subset of $\mathbb{R}^n$ with some non-Euclidean norm, such as $\|\cdot\|_1$, and performs minimization on that space. But it seems that we are still performing minimization on the Euclidean space. Even the inner product $\langle \cdot, \cdot\rangle$ is the standard "Euclidean" dot product, which induces the Euclidean norm. Why non-Euclidean?
Can someone please clarify as to what "non-Euclidean" mean in this context?

One comment is that the projected gradient descent iteration can be expressed as $$ x_{k+1} = \arg \min_{x \in C} \quad f(x_k) + \langle \nabla f(x_k),x - x_k \rangle + \frac{1}{2t} \|x - x_k \|_2^2. $$ Omitting terms that don't depend on $x$, we have $$ x_{k+1} = \arg \min_{x \in C} \quad \langle \nabla f(x_k), x \rangle + \frac{1}{2t} \|x - x_k \|_2^2. $$ This looks quite like the Mirror descent iteration given, except that the Euclidean distance $\| x - x_k \|_2$ has been replaced by the "non-Euclidean distance" $D_h(x,x_k)$. You might object that $D_h$ is not a metric, which is a valid point -- I think they are just speaking loosely.
"One of the miseries of life is that everybody names things a little bit wrong, and so it makes everything a little harder to understand." -- Richard Feynman