TLDR (simple case): Is there some sort of Kendall's W equivalent for multiple rankings by the same judges?
This question is motivated by a real-world problem: in ballroom dancing, we have multiple competitors, who each dance multiple events. In each event, we have a few judges (raters) who evaluate the competitors to produce an ordinal ranking. The judges and competitors for different events may be different, though there are only a small number of different judges and competitors.
Intuitively, we expect the different judges' rankings within a single event to be correlated. For a single event, it appears that we can compute Kendall's W to check this.
Simple case (do the judges agree?): is there an equivalent for multiple events, if we assume events are independent? If it makes things easier we can assume we have the same set of judges for all events (the set of competitors will differ though).
Follow-up (is there a biased judge?): is there a good statistical notion of whether a particular judge is "biased"? Strawman example: if we remove a particular judge, the W coefficient increases by a lot.