Not to be confused with Dijkstra's algorithm!
Can anyone help me understand why Dykstra's projection algorithm works, at-least in $\mathbb{R}^d$ ? I looked at his paper$^\dagger$ and can read what the algorithm says ( not the proof yet ) but cant get my head around why it would work. Does anyone have a video or figure which helps explain this? The image on wikipedia does show the convergence but doesn't give me a feel as to whats going on...
$\dagger$ James P. Boyle, Richard L. Dykstra, A method for finding projections onto the intersection of convex sets in Hilbert spaces, Advances in Order Restricted Statistical Inference, Springer, 1986.
Let $C_1,\ldots,C_n$ be nonempty closed convex subsets of $X$.
Set $Y := X^n$ and $A\colon X\to Y \colon x\mapsto (x,x,\ldots,x)$.
Set $C := C_1\cap \cdots \cap C_n\subset X$ and set $S := C_1\times\cdots \times C_n \subseteq Y$.
Finally, let $z\in X$.
Then the projection of $z$ onto $C$ is the unique solution to the optimization problem:
$$\min_{x\in X} \tfrac{1}{2}\|x-z\|^2 + \iota_{S}(Ax),$$ where $\iota_S$ is the indicator function of $S$.
Now set $f := x\mapsto \tfrac{1}{2}\|x-z\|^2$ and $g := \iota_S$. Then the above problem can be written as
$$\min_{x\in X} f(x) + g(Ax).$$
Next, consider the Fenchel dual of the last problem which is
$$\min_{y\in Y} f^*(-A^*y) + g^*(y).$$
Note that this dual lives in $Y=X^n$. Now if you apply cyclic descent to this dual problem, then you obtain Dykstra's algorithm. For more details, see the paper by Gaffke-Mathar on the wikipedia page you linked to.
Finally, to @littleO : Dykstra $\neq$ Douglas-Rachford. The opposite was claimed in some paper by Boyd and quashed in Bauschke and Koch's paper "Projection methods: Swiss Army knives for solving feasibility and best approximation problems with halfspaces", in Infinite Products and Their Applications, pp. 1-40, AMS, 2015.
Relevant references are the books by Bauschke and Combettes, and by Beck.