I came across this question quite a few times while studying operation research.
But could not find any satisfactory answer.
I understand why both methods provides better bfs compared to North-West corner method. I mean it's quite obvious to allocate the minimum cell instead of the north west cell.
But for $\textrm{VAM (Vogel's Approximation Method)}$, how does choosing the maximum penalty (difference between minimum and second minimum) row/column and allocating the corresponding minimum cell of the row/column, provides a better bfs than allocating the global minimum ?