what is meant by a distributed method in the context of solving (convex) optimization problems?

37 Views Asked by At

Wanted to double-check my understanding of the following:

"one of the theoretical advantages of formulating a problem as a convex optimization problem is that the associated dual problem sometimes leads to an efficient or distributed method for solving it"

What is a distributed method is it the same as a distributed algorithm? https://en.wikipedia.org/wiki/Distributed_algorithm

1

There are 1 best solutions below

0
On

Yes, distributed method and distributed algorithm mean the same thing here. Roughly speaking, in these algorithms, the tasks are 'distributed' across multiple devices, and their results are combined together through some mechanism.

Let me give a canonical example in optimization. Consider the following problem: \begin{equation} \min f(x) + g(z) \quad \mbox{s.t}\quad Ax + Bz = c. \end{equation} The Lagrangian is \begin{equation} L(x, z, y) = f(x) + g(z) + y^\top(Ax + Bz - c), \end{equation} and the dual function is given by \begin{equation} g(y) = \min_{x,z} L(x, z, y) = \min_{x,z}\left\{ f(x) + g(z) + y^\top(Ax + Bz - c)\right\}. \end{equation} Now assume that strong duality holds, then instead of solving the original problem, one can solve the dual problem: $\max_y g(y)$. Starting from an initial solution $y_0$, the subgradient method solves this problem by performing the following updates for $k = 0,1,2\ldots$: \begin{align} x_{k+1}, z_{k+1} &\gets \arg\min_{x,z} L(x,z,y_k),\tag{1} \\ y_{k+1} &\gets y_k + t_k(Ax_{k+1} + Bx_{k+1} - c). \tag{2} \end{align} In this algorithm, the most expensive step is solving the subproblem $(1)$. It is straightforward that $(1)$ is separable, so we can minimize over $x$ and $z$ independently: \begin{align} x_{k+1} &\gets \arg\min_{x} \left\{f(x) + y_k^\top Ax \right\} \tag{1a}\\ z_{k+1} &\gets \arg\min_{z} \left\{g(z) + y_k^\top Bz \right\}. \tag{1b} \end{align} Now, suppose that our computer only has one processor, then we have to use it to solve sequentially $(1a) \to (1b) \to (2)$ (or $(1b) \to (1a) \to (2)$). On the other hand, if our computer has 2 processors, then we can solve $(1a)$ and $(1b)$ in parallel (on each processor), which means the task $(1)$ has been done in a distributed manner.

The algorithm described in the above example is often referred to as dual decomposition, as we are solving the dual problem (instead of the primal one) and in addition, the steps of this task can be decomposed into smaller subproblems that are easier to solve.