What to consider when chooing weightings for bank account modulus checks

574 Views Asked by At

UK bank accounts and sort codes are validated using a couple of rules, including modulus 10 and modulus 11. The details on how this works are given here: https://www.vocalink.com/customer-support/modulus-checking/

I understand the reason for these checks are to catch common keying errors when bank account numbers are entered. I'm assuming different variations of the validation rules catch different numbers or classes of keying mistakes.

Each sort code has a set of sort and account digit weightings that are used in the calculations. They are listed here: https://www.vocalink.com/media/3036/valacdos.txt

The question is, what considerations are there when these weighting numbers are chosen?

  • Are some better than others, or do some perform better in specific circumstances?
  • If a zero is used as a weight, does that take a digit of the bank account (or sort code) out of the validation check, increasing the range of possible valid account numbers?
  • If I were tasked to create a set of weights for a new sort code, how would I go about doing that.
  • Would it make sense to try to use a different set of weightings to other banks to lower the number of bank account numbers that overlap with other banks? (That may be more a policy decision, but is that what it would effectively do?)

There are patterns in the weightings table linked above, but none seem to be used more than any other, so none seem to win out as "the best".


I'm hoping this is considered a mathematics question. I have no idea where the answer lies between "just throw a die" and "here are good mathematical reasons to choose certain numbers". I have a feeling the answer may be in this paper http://www.scielo.br/scielo.php?script=sci_arttext&pid=S2179-84512017000100105 but as a non-mathematician, this is pretty dense reading to me.

1

There are 1 best solutions below

7
On BEST ANSWER

First of all, there's no inherent reason why different sort codes should have different weights or different moduli. That's just a historical accident -- at the time banks introduced computer systems (such that having check digits became relevant and feasible) different banks decided how their check digits would work separately and happened to come up with different schemes. By the time it was recognized it would have been a good idea to use the same kind of checksum everywhere, it was too late -- millions of customers had already been issued account numbers that validated differently between different banks.

If you're designing a check digit system from scratch you can ignore much of the complexity in the document you link to; it is only there to cope with historical accidents.

Down to the actual weights: The most important features to have are

  • No digit position that you want any validation for can have weight $0$ -- that would mean that mistyping a digit in that position would never get caught. But as long as every weight is coprime to the modulus, all single-digit mistypes will get caught.
  • Two neighboring weights should be different; otherwise a mistype where you swap two neighboring digits cannot be caught by the modulus check.

Beyond this, it is mostly a matter of arbitrary choice. Checks where the weights alternate between just two values can never catch errors where you interchange abcd to cdab, which is rather undesirable. But they are slightly faster to check than codes with all-different weights, which mattered back in the 1960s. On the other hand, you can't catch all abcd-to-cdab errors with a single check digit, so it's a trade-off.

The larger the modulus, the fewer digit strings will be valid account numbers. It should be at least 10, of course -- otherwise there's again a risk that single-digit errors will not get caught.

Modulus-10 has trouble, though: Since the weights need to be coprime to the modulus there are only 4 different weights that work with 10 (namely $1$, $3$, $7$, $9$). And that again means that some neighbor transpositons cannot be caught -- e.g. 38 to 83 cannot be caught with modulus-10 for any combinations of 1, 3, 7, 9 as weights.

This is why modulus-11 is popular: any difference between non-equal digits, as well as any difference between non-equal weights will be invertible modulo 11, so one can guarantee catching all neighbor transpositions. The cost is that not all sequences of $n-1$ digits have a valid check digit.

Overlap of the valid account number ranges between sort codes is not really a relevant problem -- the totality of all valid numbers for a sort code is not really what matters in practice. What arguably matters a bit in practice is the risk that a single given account number from a different bank is also valid at yours -- but with one check digit, that risk is going to average out to $1/10$ no matter how you might scramble the entire account number population by sort code.