Intro
Hashing refers to the one-way (irreversible) process of producing a unique sequence of characters from some given input. The most basic and familiar example is hashing passwords, your passwords shouldn’t be stored unhashed anywhere other than that one piece of paper in your bedroom that you know of… Most programs or services that require you to have a password, won’t store it, they’ll feed your password into an algorithm that will produce a hash and store the hash and cross reference every time you log in. // I've simplified the process here, but no one should store a hashed password that's not salted several times.
This works because a hashing algorithm will Always produce the same hash for a given input every time.
How it works
Hashing is a complex topic that often involves algorithms that aren’t even publicly disclosed, the ones we know about are deprecated or so robust no one worries about cracking them, for now…
But the basic gist is any input, string or integer, can be mapped to a unique, fixed-size sequence of characters and numbers.
Super basic hash
To drive this concept home, imagine a simple hash algorithm that maps every alphabetical character to an integer, then that integer is multiplied by 4 and added to 7 and voila. That’s your hash.
Salting
Is the process of using randomly generated data as another input to a hashing algorithm. It’s a common practice in the field of Cryptography and used in many different one-way algorithms.
By using one or more salting rounds, collisions can be avoided while also bolstering security and obscuring the input further. However, the salt values need to be stored along with the final hash, in order to be able to produce the same hash again in the future.
Collisions
Collisions happen when 2+ inputs produce the same hash value, this is a serious flaw.
There are two main ways of fixing this issue that I know of, both are used for addressing collisions in Hashtables.
Chaining
Utilizing Linked Lists where the collisions occur. So if 2+ inputs produce the same hash value, add a Linked List to point at different colliding elements.
This is used in Hash Tables and Hashmaps where hash functions produce the index, but if collisions occur, add a LL to that index to house the colliding elements, and then search for the desired element in that list.
// Yes, in the worst case you'll need to traverse the entire list...
Closed Hashing
The hash is computed as usual, but instead of using a list for collisions a series of decisions is made instead. Process:
- Compute the hash
- If the slot in the table is empty, store the value.
- If there’s a collision, linearly scan (probe) the next slots until a free one is found and store the value there.
==// Must treat the table as a circular array and the table must be at least
n
elements large for it to have nough space.==
Probing
- Linear probing: In which the interval between probes is fixed — often set to 1.
- Quadratic probing: In which the interval between probes increases linearly (hence, the indices are described by a quadratic function).
- Double hashing: In which the interval between probes is fixed for each record but is computed by another hash function.
Common Algorithms
Look these up, I can’t do any of the justice here nor do I want to dive into it. You should be able to figure out why the first two are bad…
- Division (modulo)
- Multiplication
- Universal Hashing
- Dynamic Perfect Hashing
- Static Perfect Hashing