UK bank accounts and sort codes are validated using a couple of rules, including modulus 10 and modulus 11. The details on how this works are given here: https://www.vocalink.com/customer-support/modulus-checking/
I understand the reason for these checks are to catch common keying errors when bank account numbers are entered. I'm assuming different variations of the validation rules catch different numbers or classes of keying mistakes.
Each sort code has a set of sort and account digit weightings that are used in the calculations. They are listed here: https://www.vocalink.com/media/3036/valacdos.txt
The question is, what considerations are there when these weighting numbers are chosen?
- Are some better than others, or do some perform better in specific circumstances?
- If a zero is used as a weight, does that take a digit of the bank account (or sort code) out of the validation check, increasing the range of possible valid account numbers?
- If I were tasked to create a set of weights for a new sort code, how would I go about doing that.
- Would it make sense to try to use a different set of weightings to other banks to lower the number of bank account numbers that overlap with other banks? (That may be more a policy decision, but is that what it would effectively do?)
There are patterns in the weightings table linked above, but none seem to be used more than any other, so none seem to win out as "the best".
I'm hoping this is considered a mathematics question. I have no idea where the answer lies between "just throw a die" and "here are good mathematical reasons to choose certain numbers". I have a feeling the answer may be in this paper http://www.scielo.br/scielo.php?script=sci_arttext&pid=S2179-84512017000100105 but as a non-mathematician, this is pretty dense reading to me.
First of all, there's no inherent reason why different sort codes should have different weights or different moduli. That's just a historical accident -- at the time banks introduced computer systems (such that having check digits became relevant and feasible) different banks decided how their check digits would work separately and happened to come up with different schemes. By the time it was recognized it would have been a good idea to use the same kind of checksum everywhere, it was too late -- millions of customers had already been issued account numbers that validated differently between different banks.
If you're designing a check digit system from scratch you can ignore much of the complexity in the document you link to; it is only there to cope with historical accidents.
Down to the actual weights: The most important features to have are
Beyond this, it is mostly a matter of arbitrary choice. Checks where the weights alternate between just two values can never catch errors where you interchange
abcdtocdab, which is rather undesirable. But they are slightly faster to check than codes with all-different weights, which mattered back in the 1960s. On the other hand, you can't catch all abcd-to-cdab errors with a single check digit, so it's a trade-off.The larger the modulus, the fewer digit strings will be valid account numbers. It should be at least 10, of course -- otherwise there's again a risk that single-digit errors will not get caught.
Modulus-10 has trouble, though: Since the weights need to be coprime to the modulus there are only 4 different weights that work with 10 (namely $1$, $3$, $7$, $9$). And that again means that some neighbor transpositons cannot be caught -- e.g.
38to83cannot be caught with modulus-10 for any combinations of 1, 3, 7, 9 as weights.This is why modulus-11 is popular: any difference between non-equal digits, as well as any difference between non-equal weights will be invertible modulo 11, so one can guarantee catching all neighbor transpositions. The cost is that not all sequences of $n-1$ digits have a valid check digit.
Overlap of the valid account number ranges between sort codes is not really a relevant problem -- the totality of all valid numbers for a sort code is not really what matters in practice. What arguably matters a bit in practice is the risk that a single given account number from a different bank is also valid at yours -- but with one check digit, that risk is going to average out to $1/10$ no matter how you might scramble the entire account number population by sort code.