The Wikipedia description of the Hungarian algorithm states:
Step 3
All zeros in the matrix must be covered by marking as few rows and/or columns as possible. The following procedure is one way to accomplish this:
First, assign as many tasks as possible.
Row 1 has one zero, so it is assigned. The 0 in row 3 is crossed out because it is in the same column.
Row 2 has one zero, so it is assigned.
Row 3's only zero has been crossed out, so nothing is assigned.
Row 4 has two uncrossed zeros. Either one can be assigned, and the other zero is crossed out.
Alternatively, the 0 in row 3 may be assigned, causing the 0 in row 1 to be crossed instead.0' a2' a3' a4' b1' b2' b3' 0' 0 c2' c3' c4' d1' 0' 0 d4'
Although the description explains the solution to the example provided, I have some problems with the algorithm described in general.
Firstly, it seems to imply that with any row with two zeros you can assign either one. This is not always true: we can construct a case where there is a a row with two zeros and one with one zero overlapping. In this case, it is important that the row with one zero is processed first (after which, in this case, the row with the two zeros would only have one zero, but we could construct a case where more zeros would remain).
However, this seems to be a special case of a more general problem whereby a row can have $n$ zeros and another row $n + 1$ zeros of which $n$ overlap the first row. If $n > 1$ then neither row has a single zero. Furthermore, you could construct a case where there are no other single-zero rows and therefore no columns are eliminated and so multiple-passes would not help.
So what is a way to complete this algorithm? I have considered two possible approaches:
- Scan columnns for single-zero columns after scanning rows. This seems like it would be an improvement but I don't think it would solve for all cases.
- Sort the rows or columns by number of zeros and processes those having the smallest number first. This seems like it might work but would add the complexity of a sort which would change the performance profile of the algorithm.
I've looked at some other resources, but they mostly seem to similar.
Is there a better explanation of how to assign the zeros in this step?