I am attempting to sort rows and columns of a 2-dimensional matrix to minimize a global cost, and I'm having trouble finding an algorithm which I'm convinced is likely to succeed.
To calibrate the conversation, I don't have much experience with constrained sorting and regression algorithms, and I wouldn't be surprised if there is a canonical answer that I just don't know about.
I have a 2-dimensional matrix of, e.g., players on one axis vs. challenges on the other. If a player has succeeded at a specific challenge, there is an entry in the matrix. Failure and "never tried" are indistinguishable in the data, and any given player will generally have only accomplished a small fraction of challenges (sparse matrix).
I want to sort rows and columns to discover the orderings of players by strength, and challenges by difficulty.
To simplify, I'm assuming:
- player strength and challenge difficulty are both measured on the same one-dimensional scale.
- A player's "true" strength is marginally higher than the "true" difficulty of the hardest challenge accomplished by that player.
- A challenge's "true" difficulty is marginally below the "true" strength of the weakest player to accomplish that challenge. Maybe redundant to 2?
I don't know much about the distributions of strengths and difficulties a' priori, but some early exploration of the data is gesturing towards gaussian shapes for both.
Conceptually, the optimized ordering has a curve in the 2-d space representing the points at which the player strength approximately matches the challenge difficulty. Entries on the "wrong" side of this curve would represent underestimates of player strengths, or/and overestimates of challenge difficulty. The optimized ordering will minimize the number of entries on the wrong side of the curve, as well as their distance from the curve.
The problem seems similar to the simultaneous regression of student aptitudes and question difficulties in Item Response Theory, but I can't find any clear recipes or tools for that regression.
I'll take whatever advice I can get, but there's extra credit if it is straightforward to implement in Python.
Thanks!
edit - hardmath formalized the question: "Some clarifying points (feel free to edit the Question if you agree). There are $m$ players and $n$ challenges. Our data is an $m×n$ binary (zero-one) matrix $A$ in which $Aij=1$ denotes the success of player $i$ at challenge $j$, so that rows correspond to players and columns to challenges. You ask for permutation matrices $P,Q$ so that $PAQ$ satisfies some unstated optimization criterion related to monotonicity of player strength and challenge difficulty."
That's all correct, with the additional clarification that not all players have attempted all challenges, so a "0" cell is, effectively, no information.
I don't know the correct choice of optimization criterion for sure, but I know that the optimized sort will minimize "1"s in regions of the matrix where the challenge difficulty is estimated to be higher than the player strengths, with higher priority on entries farther outside the boundary representing challenge difficulty = player strength .