Equivalence of the two expression for dual averaging

130 Views Asked by Bumbble Comm At 11 May 2026 - 7:46

I have a question about two versions of the algorithm for dual averaging. The first one is $$x_{t+1}=x_t+\nabla f(w_t)\to w_{t+1}=P_C\left(-\frac{1}{\sqrt{t+1}}x_{t+1}\right)=\arg\min_{w\in C}\left\|w+\frac{1}{\sqrt{t+1}}x_{t+1}\right\|_2$$ The second one is

What should we define the $d$ and $\beta$ and $p$ so that we would have the equivalent algorithm? The objective function to solve is $\min_{w\in C}f(w)$ where $C$ is compact convex and $f$ is closed convex. I have been given a smoothed dual $\min_sf^*(s)-\min_{w\in C}\langle w,s\rangle+\frac{1}{2\mu}\|w\|_2^2$ where $f^*$ is Fenchel Conjugate, but I do not know where it comes from and will this help building the equivalence.

Source: Here

Original Q&A

There are 1 best solutions below

Bumbble Comm On 25 Nov 2020 - 11:10 BEST ANSWER

The first algorithm is the vanilla projected subgradient method (with the standard diminishing step size $\beta_t = \frac{1}{\sqrt{t+1}}$) for minimizing a set-constrained minimization problem: $\boxed{\min_{w\in C} f(w)}$.

The second algorithm is a more advanced variant of the same method, but for solving an unconstrained maximization problem: $\boxed{\max_{\lambda\in\mathcal{R}^n} d(\lambda)}$.

Note that the second algorithm was presented as a solution to a dual problem to another constrained minimization problem, and this dual problem turns out to be unconstrained. However, it can handle constrained problems $\boxed{\max_{\lambda\in C} d(\lambda)}$ just as well, with a small adjustment of the projection step: \begin{equation} \pi_{\beta}(x) = \arg\min_{\lambda\in C} \{\beta p(\lambda) - \langle x, \lambda\rangle\}. \end{equation} Indeed, this is equivalent to the unconstrained projection: it suffices to replace $p(\lambda)$ by $p(\lambda) + \mathcal{I}_{C}(\lambda)$ where $\mathcal{I}_{C}$ is the indicator of the set $C$. If $C$ is closed convex and $p$ is strongly convex then the new proximity function $p + \mathcal{I}_{C}$ is also strongly convex, thus the assumptions of the second (unconstrained) algorithm still hold.

Now comes your main question: How to obtain the first algorithm from the second one?

The answer is simple, just make the following change of notation:

\begin{align} \lambda &\to w \\ d &\to -f \\ x &\to -x \\ k &\to t \\ \gamma_k &\to 1 \\ \beta_k &\to \frac{1}{\sqrt{k+1}} = \frac{1}{\sqrt{t+1}} \\ p(\lambda) &\to p(w) = \frac{1}{2}\|w\|_2^2. \end{align}

Note that I have used the constrained variant of the second algorithm. If you want to use the unconstrained one, just replace the last step with $p(\lambda) \to p(w) = \frac{1}{2}\|w\|_2^2 + \mathcal{I}_{C}(w)$.

Equivalence of the two expression for dual averaging

There are 1 best solutions below

Related Questions in ALGORITHMS

Related Questions in CONVEX-ANALYSIS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NON-SMOOTH-ANALYSIS

Related Questions in NON-SMOOTH-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions