CS210 - Week 13 - Comprehensive Reference
Hash tables provide constant-time average case performance for search, insert, and delete operations by using a hash function to compute an array index directly from the key.
A map is a data structure that models a searchable collection of key-value pairs, where:
get(key), put(key, value), remove(key)Save items in a key-indexed table where the index is computed as a function of the key.
| Scenario | Strategy | Issue |
|---|---|---|
| No space limitation | Trivial hash function with key as index | Wastes enormous amounts of memory |
| No time limitation | Sequential search through all items | Very slow for large datasets |
| Real world | Hashing with collision resolution | Balance space and time efficiently |
Scramble the keys uniformly to produce a table index that is:
public int hashCode() {
return value;
}
Simply return the integer value itself.
public int hashCode() {
if (value) return 1231;
else return 1237;
}
Use two different prime numbers for true and false.
public int hashCode() {
long bits = doubleToLongBits(value);
return (int) (bits ^ (bits >>> 32));
}
Convert to 64-bit representation and XOR the two halves.
public int hashCode() {
int hash = 0;
for (int i = 0; i < length(); i++)
hash = s[i] + (31 * hash);
return hash;
}
hash = 99·31³ + 97·31² + 108·31 + 108
= 108 + 31Ā·(108 + 31Ā·(97 + 31Ā·99))
= 3,045,982
Convert hash code (any 32-bit integer) to table index (0 to M-1):
private int hash(Key key) {
return key.hashCode() % M;
}
Can return negative values!
private int hash(Key key) {
return (key.hashCode() & 0x7fffffff) % M;
}
Masks sign bit to ensure positive result
Key Assumption: Each key is equally likely to hash to an integer between 0 and M-1.
This assumption is crucial for performance analysis but is difficult to achieve in practice.
Use an array of M < N linked lists. Each array position points to a chain of items that hash to that index.
Array Index Linked List Chain
0 ā [A] ā [B] ā [E] ā null
1 ā null
2 ā [X] ā [S] ā null
3 ā null
4 ā [L] ā [P] ā null
5 ā [M] ā [H] ā [C] ā [R] ā null
public class SeparateChainingHashST<Key, Value> {
private int M = 97; // number of chains
private Node[] st = new Node[M]; // array of chains
private static class Node {
private Object key;
private Object val;
private Node next;
}
private int hash(Key key) {
return (key.hashCode() & 0x7fffffff) % M;
}
public Value get(Key key) {
int i = hash(key);
for (Node x = st[i]; x != null; x = x.next)
if (key.equals(x.key))
return (Value) x.val;
return null;
}
public void put(Key key, Value val) {
int i = hash(key);
for (Node x = st[i]; x != null; x = x.next)
if (keys[i].equals(key)) {
x.val = val;
return;
}
st[i] = new Node(key, val, st[i]);
}
}
Under uniform hashing assumption, the probability that the number of keys in a list is within a constant factor of N/M is extremely close to 1.
| Operation | Average Case | Worst Case |
|---|---|---|
| Search Hit | ~3-5 probes | N probes |
| Search Miss | ~3-5 probes | N probes |
| Insert | ~3-5 probes | N probes |
Goal: Keep average chain length N/M ā constant
When a collision occurs, probe the next array position (i+1, i+2, ...) until an empty slot is found.
Critical: Array size M must be greater than number of key-value pairs N.
Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[P] [M] [ ] [ ] [A] [C] [S] [H] [L] [ ] [E] [ ] [ ] [ ] [R] [X]
Insert K: hash(K) = 5
Position 5 occupied ā try 6 ā occupied ā try 7 ā occupied
ā try 8 ā occupied ā try 9 ā EMPTY! Insert at 9
public class LinearProbingHashST<Key, Value> {
private int M = 30001;
private Value[] vals = (Value[]) new Object[M];
private Key[] keys = (Key[]) new Object[M];
private int hash(Key key) {
return (key.hashCode() & 0x7fffffff) % M;
}
public Value get(Key key) {
for (int i = hash(key); keys[i] != null; i = (i+1) % M)
if (key.equals(keys[i]))
return vals[i];
return null;
}
public void put(Key key, Value val) {
int i;
for (i = hash(key); keys[i] != null; i = (i+1) % M)
if (keys[i].equals(key))
break;
keys[i] = key;
vals[i] = val;
}
}
Analogy: Cars arrive at one-way street with M parking spaces. Each wants random space i; if taken, try i+1, i+2, etc.
Search Hit: ~½(1 + 1/(1-α))
Search Miss/Insert: ~½(1 + 1/(1-α)²)
where α = N/M (load factor)
| Load Factor α | Search Hit | Search Miss |
|---|---|---|
| ½ | ~1.5 | ~2.5 |
| ā | ~2.0 | ~5.0 |
| ¾ | ~3.0 | ~8.5 |
Keep α = N/M ā ½ for constant-time operations
Cannot simply delete an entry and set it to null! This would break search for items that had collided with the deleted item.
Solution: After deletion, must rehash all keys in the same cluster that comes after the deleted key.
| Implementation | Search | Insert | Delete | Ordered? | Key Interface |
|---|---|---|---|---|---|
| Sequential Search | N | N | N | No | equals() |
| Binary Search | lg N | N | N | Yes | compareTo() |
| BST | 1.39 lg N | 1.39 lg N | āN | Yes | compareTo() |
| Red-Black BST | 1.0 lg N | 1.0 lg N | 1.0 lg N | Yes | compareTo() |
| Separate Chaining | 3-5* | 3-5* | 3-5* | No | equals(), hashCode() |
| Linear Probing | 3-5* | 3-5* | 3-5* | No | equals(), hashCode() |
* Under uniform hashing assumption
Hash to two positions and insert in the shorter chain.
Result: Reduces expected length of longest chain to log log N
Use linear probing but skip a variable amount (not just 1) each time.
h(x) = h1(x) + jĀ·h2(x) where j = 0, 1, 2, ...
Advantages: Effectively eliminates clustering, table can be nearly full
Hash key to two positions. If both occupied, displace one key and reinsert it.
Result: Constant worst-case time for search!
A malicious adversary who knows your hash function can craft inputs that all hash to the same bucket, causing performance to degrade to O(N).
Examples:
Due to Java's base-31 hash function, it's possible to generate 2N strings of length 2N that all hash to the same value!
"Aa" and "BB" both hash to 2112
"AaAa", "AaBB", "BBAa", "BBBB" all hash to -540425984
Special hash functions where it's computationally infeasible to:
Examples: SHA-256, SHA-3, BLAKE3
Uses: Password storage, digital signatures, blockchain
Note: Too expensive for general symbol table use
hashCode() that uses all significant fieldsequals() and hashCode() are consistent| Use Case | Best Choice | Reason |
|---|---|---|
| Need ordered operations | Red-Black BST | Hash tables don't maintain order |
| Simple, fast lookups | Hash Table (either variant) | Constant-time average case |
| Unpredictable load | Separate Chaining | Degrades gracefully |
| Memory constrained | Linear Probing | No pointer overhead |
| Cache-sensitive code | Linear Probing | Better locality of reference |
// Template for user-defined types
public int hashCode() {
int hash = 17; // nonzero constant
hash = 31 * hash + field1.hashCode(); // reference type
hash = 31 * hash + ((Integer) field2).hashCode(); // primitive
hash = 31 * hash + field3.hashCode(); // reference type
return hash;
}
You now have a comprehensive understanding of hash tables. Practice implementing both separate chaining and linear probing to solidify these concepts.
Study Guide Created from CS210 Week 13 Materials
Algorithms by Robert Sedgewick & Kevin Wayne