optimal swarm plot packing

104 Views Asked by At

A swarm plot is a recently popular form of a scatter plot for one-dimensional data. Its basis is a set of $n$ real numbers, and each value is visualized by a marker, usually a circle of given radius. The swarm plot is two-dimensional, where one (e.g. horizontal) coordinate corresponds directly to the given values, and the second (vertical) coordinate is chosen by an algorithm, such that the markers do not overlap, but they are as close to a central line as possible. Here is an example from the seaborn.swarmplot documentation:

seaborn swarmplot

My question is whether there is some insight from mathematics in how to come up with an optimal arrangement. Optimality could be defined e.g. as the smallest possible maximum absolute distance of markers from the central line or as the smallest possible mean squared distance from the central line.

Given an optimality criterion, as far as I can see the following would be an effective procedure:

  • Iterate over all $n!$ possible orderings of the values.

  • For each ordering, position markers one by one following the ordering, and choose each marker position (above or below) such that it is far enough from the center to avoid overlap, but no more than that.

  • Evaluate the criterion for each ordering, and pick the one with the optimum criterion value.

That is obviously not a viable algorithm because the number of orderings increases very quickly with $n$.

Are you aware of an algorithm to find the optimal ordering and resulting packing, or probably rather an ordering / packing close to optimality, in an efficient way?

I suspect that the plot above was generated from sorting the values in ascending order, which leads to the apparent wing- or branch-like structure. It is clearly not optimal according to either of the above-mentioned criteria, because markers at the ends of wings appear like they could be moved closer to the center. The various implementations of swarm plots for R, Python, etc. often provide different algorithms, but to my knowledge none of them are based on optimizing a criterion.


Formally: For a set of real numbers $\{x_1, x_2, \ldots, x_n\}$, find a mapping for each element $x_i \to (x_i, y_i)$ under the constraint $$ \forall_{i, j\neq i} \quad (x_i - x_j)^2 + (y_i - y_j)^2 \geq d^2 $$ where $d$ is the marker diameter, such that $$ \sum_i y_i^2 \quad \text{or} \quad \max_i |y_i| $$ is (close to) minimal.