Is there a hashing algorithm that can be done on paper?

259 Views Asked by At

I often want to explain to non-computer literates about MD5, or why any sort of hashing is done. Is there any type of simple hashing algorithm that can be done with a pen and piece of paper (possibly a calculator) that I can use to help explain hashing?

Like a regular hashing function, it needs to be non-reversible (or rather, can reverse back to many different possible values, so that finding the "original" one is improbable.) Either string or integer hashing is fine.

EDIT: To be clear, I am neither looking for checksum style use of hashes to verify data integrity or typos in a string of numbers, nor am I looking for encryption that can be reversed with a key. What I mean is the way hashing is used on by software developers on passwords to "hide" or "obfuscate" the original value before storing it in a database.

2

There are 2 best solutions below

10
On

The simplest algorithms that can be done on paper are things like ISBN numbers, and things like 'this number is a multiple of 11 in base 13'. I have seen the latter being used: it's very efficient at picking up keying errors.

The whole point about hashing is that you can derive the last digit from the rest of the numbers.

The purpose of hashing is to check the more obvious keying errors. Multiples of 7 are a good example here. So, for example 1211 is a multiple of 7, and no other miskey of this, eg 1121, is a multople. Likewise, you could have eg 3024 works, but 3042 or 3224 or 3302 and 3002 are all not multiples of 7, but all common keying errors.

My suspicion is that OP has confused hashing with encrypting. If someone sends me a file, and an MD5 for that file, i can run a program that generates the md5 myself, and compare it to what's in the sent file. But nothing stops me opening the file without access to the md5 data.

You can have hashed data in plain text, eg 13.000.8, would mean group 13, function 000, hash 8. A program can read the number 130008 and find it has a valid hash, without knowing that it consists of subfields.

Encryption is entirely different. It's sort of like a code, except that it's much fancier than even the enigma stuff from WW2. Basically, the simple root for encryption is letter substitution (eg A-B-C-D-...), so abc becomes bcd.

You can't have encrypted data in palin text, except by having a lot of text. For example, FILE might becoem "fred Ingles left early". You could make file into something like GJMF or GHMD, by way of taking the next letter, or alternating forwards and backwards. Rot13 is an example of a self-decoding code.

Registration Key Hashes

A good number of different approaches exist for providing copy control on software. This is pretty much a spy-vs-spy game too, as vendors try different tricks, and people get around them.

In essence, the latest strategies are to build a certain amount of information, and then provide a third 'validation key' on it. The validation key is generally a "public key" algorithm, where the user has the programs to validate, but not create, such a key. This key is then used to encrypt a word made up from the user-name and validation key.

A simple example, might be something like a modulus against a large prime, usually larger than the word. An example is to pick a big prime, like 991, (usually larger numbers are picked). If we take our mod-7 example and look at a key like 422-1, the 1 can be calculated from the 422 bit, so we feed just 422 into the program.

We now suppose to multiply by 31, and take away multiples of 991. We get

 31 * 422 mod 991  gives 199    (this is the validation key)

The user program knows to multiply the validation key by 32, and compare it with the first three digits: eg

 32 * 199 mod 991  gives 422.  

The progran then knows that the two numbers 422 and 199 make a correct licence.

Note that the user's program does not know how to create a key. What it does instead, is to create a hash based on what is held locally about the user/reg key, and compare it with the decrypted value held in registry. But these do not have to be plain-text stuff, and exist in memory only as long as to see that 199 gives the number found in registry.

It's basically an encrypted hash key, but the hash key is often bigger than the information it is a hash of.

The trapdoor function works on this process, but relies on the difficulty of factorising very large numbers. The thing is an 'open' and 'close' operation, where only one of the keys is made public. In encrypted messages, the 'open' key is public, so anyone can send a message, but the 'close' key is held by the intended reader only.

In software registration, the 'open' key is held private, since only the vendor can make valid copies, while the 'shut' key is in the registration program, allowing the program to create a key, and decript the validation string for an equal hash.

9
On

For people to understand why we are using hash functions they need to understand two main concepts of them:

  1. They have no inverse function. (non-injective, surjective)
  2. Universally distributed values in the codomain/target-set.

Explaining 1.:

I would just go with mod 10. Take example counting on fingers. If above ten, you start again and mostly just keep count of tens in the head. Would you forgot the count you can not tell the exact count by looking on your hand.

Okay, then they will ask why is this useful. Then you tell them about authentication. Imagine a system where user password can be only a natural number x.

The safe way - maybe you can say a spy has access what we store in the system - to store passwords is actually no to store them, but have the ability to decide if its the same code when the user registered and chose x. Then you just say, we store the mod 10 of x and check if it is the same when the user wants to log in.

Now if they following you, they gonna argue that it has only 10 possible values. THIS is wat you've been waiting for to introduce them to 2., otherwise they've already lost interest. [:


Explaining 2.:

If they've followed you this way, they already sensing the problem why mod 10 is a bad hashing function for authentication, however they are getting a grasp what a non-invertable function is. So you can say that instead of mod 10 we do a lot of other calculations after each other which can have a lot more possible values (big codomain).

Now if they want to hear more, you can tell them that authentication is only just one use case. Sometimes to keep the secret is not that important but the uniqueness of the result.

You can just bring up peoples signatures and the concept of contracts. Sometimes you can read out the normal names of people from theire signatures, but it does not matter. The fact however, that every people has a different signature and - in an ideal word - every time the same is the real deal.

Now just tell them that this is how a hard-drive knows if its file is corrupt or none of them blocks gone bad.