(All arrows point downward.)
For the poset $P: 2 < 4, 1 < 3, 1 < 4, 3 < 5$ we get the graph:

A linear extension of this poset is $1,2,3,4,5$.
"A downset or ideal of a poset $(P, ≤)$ is a subset $D ⊆ P$ such that $x ∈ D, y ∈ P$ and $y ≤ x$ imply $y ∈ D$
If we consider the set of ideals $I(P)$ and equip it with the set inclusion $⊆$, which is a partial order relation, a new poset $(I(P), ⊆)$ is obtained. Moreover, it is more than a poset: it is a distributive lattice."

A linear extension of the original poset $P$ corresponds to a max length path in the lattice. Where traversing each node gives {NChild \ NParent}. Going down the middle path we get: $2, 1, 3, 5, 4$ which is a valid linear extension of $P$.
Now the goal is to compute all linear extensions, in parallel on a multicore processor (a GPU to be specific).
For $P$ there are 9 linear extensions:
{{1, 2, 3, 4, 5},
{1, 2, 3, 5, 4},
{1, 2, 4, 3, 5},
{1, 3, 2, 4, 5},
{1, 3, 2, 5, 4},
{1, 3, 5, 2, 4},
{2, 1, 3, 4, 5},
{2, 1, 3, 5, 4},
{2, 1, 4, 3, 5}}
We can just perform a depth first search of the graph to obtain these, but this doesn't parallelize well. What structural properties of distributive lattices can be used to open up more possibilities?
The number of paths through a lattice describe many known sequences of numbers: Catalan etc. Can this information be used to generate an efficient unranking function? Where each thread is assigned an id from $ {1,2...n} $ and the function $Unrank(id) = {L,R,L,R,R,L}$ generates the corresponding branch decisions.
An efficient encoding system for all maximal paths exists. This unranking function maps nicely onto GPU threads. See: https://cs.stackexchange.com/a/16437/10979