šŸ“š Hash Tables Study Guide

CS210 - Week 13 - Comprehensive Reference

šŸ“– Table of Contents

1. Introduction to Hash Tables

šŸŽÆ Main Concept

Hash tables provide constant-time average case performance for search, insert, and delete operations by using a hash function to compute an array index directly from the key.

What is a Map?

A map is a data structure that models a searchable collection of key-value pairs, where:

Real-World Examples

2. Hashing: Basic Plan

Core Idea

Save items in a key-indexed table where the index is computed as a function of the key.

šŸ”‘ Key Components

  1. Hash Function: Method for computing array index from key
  2. Equality Test: Method for checking if two keys are equal
  3. Collision Resolution: Strategy for handling keys that hash to the same index

The Space-Time Tradeoff

Scenario Strategy Issue
No space limitation Trivial hash function with key as index Wastes enormous amounts of memory
No time limitation Sequential search through all items Very slow for large datasets
Real world Hashing with collision resolution Balance space and time efficiently

3. Hash Functions

⚔ Idealistic Goal

Scramble the keys uniformly to produce a table index that is:

Computing Hash Codes

For Integers

public int hashCode() { 
    return value; 
}

Simply return the integer value itself.

For Booleans

public int hashCode() {
    if (value) return 1231;
    else return 1237;
}

Use two different prime numbers for true and false.

For Doubles

public int hashCode() {
    long bits = doubleToLongBits(value);
    return (int) (bits ^ (bits >>> 32));
}

Convert to 64-bit representation and XOR the two halves.

For Strings (Horner's Method)

public int hashCode() {
    int hash = 0;
    for (int i = 0; i < length(); i++)
        hash = s[i] + (31 * hash);
    return hash;
}
h = s[0]Ā·31L-1 + s[1]Ā·31L-2 + ... + s[L-2]Ā·31 + s[L-1]

String Hash Example: "call"

hash = 99·31³ + 97·31² + 108·31 + 108
     = 108 + 31Ā·(108 + 31Ā·(97 + 31Ā·99))
     = 3,045,982

Modular Hashing

Convert hash code (any 32-bit integer) to table index (0 to M-1):

āŒ Bug Version

private int hash(Key key) {
    return key.hashCode() % M;
}

Can return negative values!

āœ… Correct Version

private int hash(Key key) {
    return (key.hashCode() & 0x7fffffff) % M;
}

Masks sign bit to ensure positive result

Uniform Hashing Assumption

Key Assumption: Each key is equally likely to hash to an integer between 0 and M-1.

This assumption is crucial for performance analysis but is difficult to achieve in practice.

Birthday Problem & Load Balancing

4. Separate Chaining

šŸ’” Core Idea

Use an array of M < N linked lists. Each array position points to a chain of items that hash to that index.

How It Works

  1. Hash: Map key to integer i between 0 and M-1
  2. Insert: Put at front of ith chain (if not already there)
  3. Search: Search only the ith chain

Visual Representation

Array Index          Linked List Chain
    0         →     [A] → [B] → [E] → null
    1         →     null
    2         →     [X] → [S] → null
    3         →     null
    4         →     [L] → [P] → null
    5         →     [M] → [H] → [C] → [R] → null
            

Java Implementation

public class SeparateChainingHashST<Key, Value> {
    private int M = 97;  // number of chains
    private Node[] st = new Node[M];  // array of chains
    
    private static class Node {
        private Object key;
        private Object val;
        private Node next;
    }
    
    private int hash(Key key) {
        return (key.hashCode() & 0x7fffffff) % M;
    }
    
    public Value get(Key key) {
        int i = hash(key);
        for (Node x = st[i]; x != null; x = x.next)
            if (key.equals(x.key))
                return (Value) x.val;
        return null;
    }
    
    public void put(Key key, Value val) {
        int i = hash(key);
        for (Node x = st[i]; x != null; x = x.next)
            if (keys[i].equals(key)) { 
                x.val = val; 
                return; 
            }
        st[i] = new Node(key, val, st[i]);
    }
}

Performance Analysis

āœ… Proposition

Under uniform hashing assumption, the probability that the number of keys in a list is within a constant factor of N/M is extremely close to 1.

Average # of probes = N/M
Operation Average Case Worst Case
Search Hit ~3-5 probes N probes
Search Miss ~3-5 probes N probes
Insert ~3-5 probes N probes

Resizing Strategy

Goal: Keep average chain length N/M ā‰ˆ constant

5. Linear Probing

šŸ’” Core Idea (Open Addressing)

When a collision occurs, probe the next array position (i+1, i+2, ...) until an empty slot is found.

How It Works

  1. Hash: Map key to integer i between 0 and M-1
  2. Insert: Put at table index i if free; if not, try i+1, i+2, etc.
  3. Search: Search table index i; if occupied but no match, try i+1, i+2, etc.

Critical: Array size M must be greater than number of key-value pairs N.

Visual Example

Index:  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
       [P] [M] [ ] [ ] [A] [C] [S] [H] [L] [ ] [E] [ ] [ ] [ ] [R] [X]

Insert K: hash(K) = 5
          Position 5 occupied → try 6 → occupied → try 7 → occupied
          → try 8 → occupied → try 9 → EMPTY! Insert at 9
            

Java Implementation

public class LinearProbingHashST<Key, Value> {
    private int M = 30001;
    private Value[] vals = (Value[]) new Object[M];
    private Key[] keys = (Key[]) new Object[M];
    
    private int hash(Key key) {
        return (key.hashCode() & 0x7fffffff) % M;
    }
    
    public Value get(Key key) {
        for (int i = hash(key); keys[i] != null; i = (i+1) % M)
            if (key.equals(keys[i]))
                return vals[i];
        return null;
    }
    
    public void put(Key key, Value val) {
        int i;
        for (i = hash(key); keys[i] != null; i = (i+1) % M)
            if (keys[i].equals(key))
                break;
        keys[i] = key;
        vals[i] = val;
    }
}

Knuth's Parking Problem

Analogy: Cars arrive at one-way street with M parking spaces. Each wants random space i; if taken, try i+1, i+2, etc.

Performance Analysis

Search Hit: ~½(1 + 1/(1-α))

Search Miss/Insert: ~½(1 + 1/(1-α)²)

where α = N/M (load factor)

Load Factor α Search Hit Search Miss
½ ~1.5 ~2.5
ā…” ~2.0 ~5.0
¾ ~3.0 ~8.5

āš™ļø Typical Choice

Keep α = N/M ā‰ˆ ½ for constant-time operations

Deletion in Linear Probing

āš ļø Critical Issue

Cannot simply delete an entry and set it to null! This would break search for items that had collided with the deleted item.

Solution: After deletion, must rehash all keys in the same cluster that comes after the deleted key.

6. Performance Comparison

Separate Chaining vs Linear Probing

āœ… Separate Chaining

  • Performance degrades gracefully
  • Less sensitive to poorly-designed hash functions
  • Easy deletion
  • Can exceed load factor of 1

āœ… Linear Probing

  • Less wasted space (no pointers)
  • Better cache performance
  • Simpler implementation
  • Faster when load factor is low

Complete Symbol Table Implementations Comparison

Implementation Search Insert Delete Ordered? Key Interface
Sequential Search N N N No equals()
Binary Search lg N N N Yes compareTo()
BST 1.39 lg N 1.39 lg N √N Yes compareTo()
Red-Black BST 1.0 lg N 1.0 lg N 1.0 lg N Yes compareTo()
Separate Chaining 3-5* 3-5* 3-5* No equals(), hashCode()
Linear Probing 3-5* 3-5* 3-5* No equals(), hashCode()

* Under uniform hashing assumption

7. Advanced Topics

Hash Function Variants

Two-Probe Hashing

Hash to two positions and insert in the shorter chain.

Result: Reduces expected length of longest chain to log log N

Double Hashing

Use linear probing but skip a variable amount (not just 1) each time.

h(x) = h1(x) + jĀ·h2(x)  where j = 0, 1, 2, ...

Advantages: Effectively eliminates clustering, table can be nearly full

Cuckoo Hashing

Hash key to two positions. If both occupied, displace one key and reinsert it.

Result: Constant worst-case time for search!

Algorithmic Complexity Attacks

🚨 Security Concern

A malicious adversary who knows your hash function can craft inputs that all hash to the same bucket, causing performance to degrade to O(N).

Examples:

Java String Hash Collision

Due to Java's base-31 hash function, it's possible to generate 2N strings of length 2N that all hash to the same value!

"Aa" and "BB" both hash to 2112
"AaAa", "AaBB", "BBAa", "BBBB" all hash to -540425984
            

One-Way Hash Functions

Cryptographic Hash Functions

Special hash functions where it's computationally infeasible to:

Examples: SHA-256, SHA-3, BLAKE3

Uses: Password storage, digital signatures, blockchain

Note: Too expensive for general symbol table use

8. Summary & Best Practices

šŸŽÆ Key Takeaways

Best Practices

āœ… DO:

āŒ DON'T:

When to Use What

Use Case Best Choice Reason
Need ordered operations Red-Black BST Hash tables don't maintain order
Simple, fast lookups Hash Table (either variant) Constant-time average case
Unpredictable load Separate Chaining Degrades gracefully
Memory constrained Linear Probing No pointer overhead
Cache-sensitive code Linear Probing Better locality of reference

Common Pitfalls

  1. Hash Code Overflow: Forgetting to mask sign bit can cause negative indices
  2. Poor Hash Functions: Using only part of the key leads to clustering
  3. Inconsistent equals/hashCode: If a.equals(b) but a.hashCode() ≠ b.hashCode(), table breaks
  4. Ignoring Load Factor: Letting table get too full degrades performance
  5. Security: Using predictable hash functions in adversarial environments

Quick Reference: Hash Code Implementation

// Template for user-defined types
public int hashCode() {
    int hash = 17;  // nonzero constant
    hash = 31 * hash + field1.hashCode();  // reference type
    hash = 31 * hash + ((Integer) field2).hashCode();  // primitive
    hash = 31 * hash + field3.hashCode();  // reference type
    return hash;
}

šŸŽ“ You're Ready!

You now have a comprehensive understanding of hash tables. Practice implementing both separate chaining and linear probing to solidify these concepts.

Study Guide Created from CS210 Week 13 Materials

Algorithms by Robert Sedgewick & Kevin Wayne