C in 100 Seconds: Hash Table Chaining
Video: C in 100 Seconds: Hash Table — Chaining, Insert, Lookup, Delete | Episode 49 by Taught by Celeste AI - AI Coding Coach
C Hash Table: Chaining, Insert, Lookup, Delete
Each bucket is a linked list. Insert prepends; lookup walks; delete unlinks. The classic implementation that handles collisions gracefully.
Episode 48 covered the hash-table concept. Today: a working implementation with chaining — each bucket holds a linked list of entries, so collisions just become chain nodes.
The basic shape
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 10
typedef struct Entry {
char *key;
int value;
struct Entry *next;
} Entry;
Entry *table[SIZE] = {NULL};
int hash(char *key) {
int sum = 0;
while (*key) sum += *key++;
return sum % SIZE;
}
void insert(char *key, int value) {
int idx = hash(key);
Entry *e = malloc(sizeof(Entry));
e->key = key;
e->value = value;
e->next = table[idx]; // prepend to the chain
table[idx] = e;
}
int lookup(char *key) {
int idx = hash(key);
for (Entry *e = table[idx]; e != NULL; e = e->next) {
if (strcmp(e->key, key) == 0) return e->value;
}
return -1;
}
void delete(char *key) {
int idx = hash(key);
Entry *e = table[idx];
Entry *prev = NULL;
while (e != NULL) {
if (strcmp(e->key, key) == 0) {
if (prev == NULL) table[idx] = e->next;
else prev->next = e->next;
free(e);
return;
}
prev = e;
e = e->next;
}
}
Insert: prepend to the chain
void insert(char *key, int value) {
int idx = hash(key);
Entry *e = malloc(sizeof(Entry));
e->key = key;
e->value = value;
e->next = table[idx];
table[idx] = e;
}
Always prepend. O(1) regardless of chain length.
A subtlety: this version doesn't check for duplicate keys. Inserting the same key twice creates two entries; lookup returns the most recent (because it's at the head). For "update or insert" semantics:
void insert_or_update(char *key, int value) {
int idx = hash(key);
for (Entry *e = table[idx]; e != NULL; e = e->next) {
if (strcmp(e->key, key) == 0) {
e->value = value; // update existing
return;
}
}
// insert new
Entry *e = malloc(sizeof(Entry));
e->key = key;
e->value = value;
e->next = table[idx];
table[idx] = e;
}
Lookup: walk the chain
int lookup(char *key) {
int idx = hash(key);
for (Entry *e = table[idx]; e != NULL; e = e->next) {
if (strcmp(e->key, key) == 0) return e->value;
}
return -1;
}
Hash to find the bucket, walk the chain, strcmp each key. Return on match.
Average chain length: n / SIZE (load factor). Average lookup: O(1 + load_factor). Good hash + reasonable load → near-constant time.
Worst case (all keys collide): O(n).
Delete: walk and splice
void delete(char *key) {
int idx = hash(key);
Entry *e = table[idx];
Entry *prev = NULL;
while (e != NULL) {
if (strcmp(e->key, key) == 0) {
if (prev == NULL) table[idx] = e->next;
else prev->next = e->next;
free(e);
return;
}
prev = e;
e = e->next;
}
}
Walk the chain tracking prev. On match, splice the entry out (update prev->next or table[idx]), free.
Same logic as linked-list delete from episode 43.
Iterating all entries
void print_table() {
for (int i = 0; i < SIZE; i++) {
printf("[%d]", i);
Entry *e = table[i];
while (e != NULL) {
printf(" -> %s:%d", e->key, e->value);
e = e->next;
}
printf("\n");
}
}
Walk every bucket; for each, walk its chain. O(SIZE + n) — visits every bucket plus every entry.
For "give me all entries":
void each(void (*fn)(const char *, int)) {
for (int i = 0; i < SIZE; i++) {
for (Entry *e = table[i]; e != NULL; e = e->next) {
fn(e->key, e->value);
}
}
}
Function-pointer callback for visitor pattern.
Memory ownership
Entry *e = malloc(sizeof(Entry));
e->key = key; // does the table own this string?
Two policies:
Borrow: the caller owns the key, the table just holds a pointer. Inserting a stack-buffer pointer is dangerous — the buffer reused, the table now points at garbage.
Own: the table copies the key on insert.
void insert(char *key, int value) {
// ...
e->key = strdup(key); // copy
e->value = value;
// ...
}
void delete(char *key) {
// ...
free(e->key); // free the copy
free(e);
}
Stricter; safer; costs an extra malloc per entry. Typical for production code.
Resizing
When load factor exceeds a threshold (e.g., 0.75), double the table:
void resize() {
int old_size = SIZE;
Entry **old_table = table;
table = calloc(new_size, sizeof(Entry *));
for (int i = 0; i < old_size; i++) {
Entry *e = old_table[i];
while (e != NULL) {
Entry *next = e->next;
// rehash with new size
int idx = hash(e->key);
e->next = table[idx];
table[idx] = e;
e = next;
}
}
free(old_table);
}
Walk every old entry, rehash into the new table. O(n) — but amortized over many inserts, it's still O(1) average.
For a fixed-size table, the load factor only grows. Eventually lookups slow to O(n).
Free the whole table
void free_table() {
for (int i = 0; i < SIZE; i++) {
Entry *e = table[i];
while (e != NULL) {
Entry *next = e->next;
free(e->key); // if owned
free(e);
e = next;
}
table[i] = NULL;
}
}
Walk each bucket, free every entry. Set buckets to NULL for safety.
Open addressing variant (briefly)
For comparison, open addressing stores entries directly in the array:
typedef struct {
char *key;
int value;
int occupied;
} Entry;
Entry table[SIZE];
void insert(char *key, int value) {
int idx = hash(key);
while (table[idx].occupied) {
idx = (idx + 1) % SIZE;
}
table[idx].key = key;
table[idx].value = value;
table[idx].occupied = 1;
}
No malloc per entry — all slots in the array. Cache-friendly. But: deletion is tricky (can't just clear a slot or you break probe sequences for other keys; use "tombstone" markers instead).
Most production hash tables use open addressing for performance. Chaining is conceptually simpler.
Common mistakes
Comparing keys with ==. Pointer equality. Use strcmp for strings, custom equals for structs.
Borrowed key going out of scope. insert(stack_buf, 5); then stack_buf reused — table holds a stale pointer. Either copy keys or document the lifetime requirement.
Leak on delete. Free the entry; if you're storing a copied key, free that too.
Not resizing. Fixed-size table eventually has long chains. Double when load_factor > 0.75.
Hash function dependent on insertion order. Some weak hashes cluster predictably. djb2 or FNV are robust.
Iterating while deleting. Modifying the table during iteration confuses pointers. Collect keys first, then delete.
What's next
Episode 50: binary tree — create, insert, traverse. Hierarchical data with two children per node.
Recap
Chaining: each bucket is a linked list. Insert: hash, prepend. Lookup: hash, walk chain, strcmp. Delete: hash, walk, splice, free. Iteration: walk every bucket and every chain — O(SIZE + n). Decide if the table owns or borrows the key strings. Resize when load factor exceeds threshold. Open addressing is faster but more complex; chaining is simpler and good for learning.
Next episode: binary trees.