I am trying to develop a scheme for generating unique (probable within bounds) ids in a distributed application. I want the id to be easily remembered, easily spoken, and easily read. I chose base32 as the encoding method, as it gives a large set of characters, universally recognised, and without confusion between similar looking characters.
So, if I have a set of 32 characters, and an id of length 4, I have $32^4$ (1048576) permutations, so the chance of a collision when a new id is created is something like one in 32^4/total-existing-ids - which is too much of a chance.
I think $32^6$ is reasonable as a trade off between memorability, and chance of collision, but it is harder to remember and read 6 random base32 characters.
I find it much easier, if all the numbers and letters are grouped, so where it might have been e4hu8a I find ehua48 much easier to remember/read/speak. Or k3c5f0 as kcf350.
So, finally, my question...
How much entropy do I lose by my scheme of rearranging totally random base32 strings so letters come before numbers (but remain in the original order amongst themselves)?
How can I calculate the entropy for different length strings using this method?
This answer assumes you use all $36$ letters and numbers. I don't know how many letters and numbers are among the $32$. You should be able to update it to reflect your alphabet.
To calculate the entropy, you just need to calculate the number of ids and take the base 2 log. If there are six letters, you have $26^6=308,915,776$ ids. To get the number with numbers included, note that the number with five letters and a number is $\frac {10}{26}$ of that, so the total is $26^6\sum_{i=0}^6 \left(\frac{10}{26}\right)^i=26^6\frac{1-(\frac{10}{26})^7}{1-\frac{10}{26}}\approx 5E8$ The base $2$ log of this is about $28.9$. This should be compared with $\log_2 36^6 \approx 31.0$, so you are losing about $2.1$ bits of entropy. I assumed you always have letters before numbers. You can reduce this a bit if you allow both orders of numbers and letters.