I have a weighted hash -
CHOICES : PROBABILITY
choice_a: 0.2
choice_b: 0.7
choice_c: 0.1
Given a string of arbitrary length, say foobar, how do I map it to one of these choices with their respective probabilities, and make the selection reproducible?
Say, if string foobar maps to choice_a the first time around, it should do so consistently for all subsequent calls.
The string can be assumed to be alphanumeric if that helps.
A naive approach
One can sum up the numerical value of each character (A = 1, B= 2 ...) and then take a mod with the number of choices.
foobar => (6 + 15 + 15 + 2 + 1 + 18) = 57
57 % 3 = 0 (choice_a)
The problem with this is, the probabilities are divided equally between the choices.
If you have a hash function that maps strings uniformly into the set $\{0,1,\dots,N-1\}$, you can then map this to your distribution by taking $0\leq n\leq N-1$ to:
This will be an exact mapping if $N$ is a multiple of $10$, and the error will be small for large $N$: the total variation distance is at most $2/N$. This is a special case of inverse transform sampling, i.e. inverting a cumulative distribution function.
The uniform distribution is always an assumption. An omniscient adversary can always choose inputs to give your hash table worst-case behaviour. In practice, a good choice of string hash function is "SipHash". See here for more discussion of various hash functions: https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/316350#316350.