While reading Weisfeiler-Lehman Neural Machine for Link Prediction (M. Zhang and Y. Chen), I got to a part which seems quite unclear and is not really explained by the authors (maybe because it's trivial, but it certainly isn't to me). I'm trying to reproduce/re-implement what they did, hence I need a good understanding of this part before proceeding.
Brief summary of Palette-WL
It's an algorithm proposed by the authors which, given some initial coloring of a graph, will produce a more refined one while guaranteeing that the order of the colors is preserved, as per Definition 3.1
An iterative graph labeling algorithm is color- order preserving if: given any two vertices $v_a$ and $v_b$, if $v_a$ has a smaller color than $v_b$ at an iteration, then $v_a$ gets a smaller color than $v_b$ in the next iteration.
This property is important, because the end goal is to extract structural information about each edge in a large network which will help train a classifier for deciding the existence of new/unseen edges. For a subgraph induced by an edge $E=(x,y)$, the property lets us preserve the information that the subgraph is centered at $E$ even after all iterations of color refinement. This can be done by setting the initial colors $c_0(x)=c_0(y)=1$ and $c_0(v) > 1$ for all other vertices $v$.
The problem
The reason color refinement is run in the first place is to impose an ordering on the vertex set and sort the adjacency matrix which is input to some ML model, however Palette-WL doesn't guarantee strict ordering i.e. some vertices can have the same color. The authors only briefly outline the solution, without much further explanation:
Finally, we sort the vertices in VK using their Palette-WL colors in ascending order. If there are vertices with the same color, we use Nauty, a graph canonization tool, to break the ties [20].
[20] ... Practical graph isomorphism, {II}. Journal of Symbolic Computation, 60(0):94 – 112, 2014. ISSN 0747-7171.
I read about canonical labeling and even calculated it for an example induced subgraph using PyNauty, but I don't really know what's happening behind the function call and I couldn't find anything related to using canonical labeling for "tie-braking" in this sense.
Would really appreciate it if someone with more insight into Graph theory and isomorphism problems can help explain what the intended meaning of tie breaking is here and what steps are taken during it (for example, how we would recolor the yellow nodes in the graph from the picture below).
$$H \text{ after Palette-}WL$$
Here is what I guess it is.
From what I understand, Palette-WL gives a valid labeling $l$ (if two vertices are identical, they must have the same label, but it not necessarily reciprocal) that may not distinguish all the vertices but satisfy some additional requirements of the authors regarding distance to the target link. To obtain a full canonical labeling, they run Nauty to find a canonical labeling $l'$. They break the ties by ordering vertices with a same label $l$ by considering their order in the labeling $l'$. This will give a labeling that is both canonical and satisfy the additional requirements of the authors.
For example, take the path $P_6$, with the center edge as the target link.
A palette-WL coloring could be
3-2-1-1-2-3. The labels increase while moving away from the target link. Two identical vertices have identical labels.A canonical labeling given by Nauty could be
1-2-3-4-5-6. We can relabel palette-WL by taking into account the order in the Nauty coloring. For example, for the two vertices labelled3in palette-WL, the first one(in Nauty's order) will be relabeled3.0and the second one3.5.So we get the canonical labeling
3.0-2.0-1.0-1.5-2.5-3.5, and the vertices are visited in order5-3-1-2-4-6