Weighted random choice using a string

42 Views Asked by At

I have a weighted hash -

  CHOICES : PROBABILITY

  choice_a: 0.2
  choice_b: 0.7
  choice_c: 0.1

Given a string of arbitrary length, say foobar, how do I map it to one of these choices with their respective probabilities, and make the selection reproducible?

Say, if string foobar maps to choice_a the first time around, it should do so consistently for all subsequent calls.

The string can be assumed to be alphanumeric if that helps.

A naive approach

One can sum up the numerical value of each character (A = 1, B= 2 ...) and then take a mod with the number of choices.

foobar => (6 + 15 + 15 + 2 + 1 + 18) = 57

57 % 3 = 0 (choice_a)

The problem with this is, the probabilities are divided equally between the choices.

1

There are 1 best solutions below

1
On BEST ANSWER

If you have a hash function that maps strings uniformly into the set $\{0,1,\dots,N-1\}$, you can then map this to your distribution by taking $0\leq n\leq N-1$ to:

  • Choice A if $0\leq n/N < 0.2$
  • Choice B if $0.2\leq n/N < 0.2+0.7$
  • Choice C if $0.2+0.7\leq n/N < 1$

This will be an exact mapping if $N$ is a multiple of $10$, and the error will be small for large $N$: the total variation distance is at most $2/N$. This is a special case of inverse transform sampling, i.e. inverting a cumulative distribution function.

The uniform distribution is always an assumption. An omniscient adversary can always choose inputs to give your hash table worst-case behaviour. In practice, a good choice of string hash function is "SipHash". See here for more discussion of various hash functions: https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/316350#316350.