I have a question about two versions of the algorithm for dual averaging.
The first one is
$$x_{t+1}=x_t+\nabla f(w_t)\to w_{t+1}=P_C\left(-\frac{1}{\sqrt{t+1}}x_{t+1}\right)=\arg\min_{w\in C}\left\|w+\frac{1}{\sqrt{t+1}}x_{t+1}\right\|_2$$
The second one is

What should we define the $d$ and $\beta$ and $p$ so that we would have the equivalent algorithm? The objective function to solve is $\min_{w\in C}f(w)$ where $C$ is compact convex and $f$ is closed convex. I have been given a smoothed dual $\min_sf^*(s)-\min_{w\in C}\langle w,s\rangle+\frac{1}{2\mu}\|w\|_2^2$ where $f^*$ is Fenchel Conjugate, but I do not know where it comes from and will this help building the equivalence.
Source: Here
The first algorithm is the vanilla projected subgradient method (with the standard diminishing step size $\beta_t = \frac{1}{\sqrt{t+1}}$) for minimizing a set-constrained minimization problem: $\boxed{\min_{w\in C} f(w)}$.
The second algorithm is a more advanced variant of the same method, but for solving an unconstrained maximization problem: $\boxed{\max_{\lambda\in\mathcal{R}^n} d(\lambda)}$.
Note that the second algorithm was presented as a solution to a dual problem to another constrained minimization problem, and this dual problem turns out to be unconstrained. However, it can handle constrained problems $\boxed{\max_{\lambda\in C} d(\lambda)}$ just as well, with a small adjustment of the projection step: \begin{equation} \pi_{\beta}(x) = \arg\min_{\lambda\in C} \{\beta p(\lambda) - \langle x, \lambda\rangle\}. \end{equation} Indeed, this is equivalent to the unconstrained projection: it suffices to replace $p(\lambda)$ by $p(\lambda) + \mathcal{I}_{C}(\lambda)$ where $\mathcal{I}_{C}$ is the indicator of the set $C$. If $C$ is closed convex and $p$ is strongly convex then the new proximity function $p + \mathcal{I}_{C}$ is also strongly convex, thus the assumptions of the second (unconstrained) algorithm still hold.
Now comes your main question: How to obtain the first algorithm from the second one?
The answer is simple, just make the following change of notation:
\begin{align} \lambda &\to w \\ d &\to -f \\ x &\to -x \\ k &\to t \\ \gamma_k &\to 1 \\ \beta_k &\to \frac{1}{\sqrt{k+1}} = \frac{1}{\sqrt{t+1}} \\ p(\lambda) &\to p(w) = \frac{1}{2}\|w\|_2^2. \end{align}
Note that I have used the constrained variant of the second algorithm. If you want to use the unconstrained one, just replace the last step with $p(\lambda) \to p(w) = \frac{1}{2}\|w\|_2^2 + \mathcal{I}_{C}(w)$.