C in 100 Seconds: Merge Sort
Video: C in 100 Seconds: Merge Sort | Episode 54 by CelesteAI
C Merge Sort
Recursive divide and conquer. Split the array in half, sort each half, merge. O(n log n) worst case. Stable, but uses O(n) extra memory.
Merge sort is a classic divide-and-conquer algorithm. It guarantees O(n log n) — no degenerate cases. The merge step is the heart.
The basic shape
#include <stdio.h>
void merge(int arr[], int left, int mid, int right) {
int i = left, j = mid + 1, k = 0;
int temp[right - left + 1];
while (i <= mid && j <= right) {
if (arr[i] <= arr[j])
temp[k++] = arr[i++];
else
temp[k++] = arr[j++];
}
while (i <= mid) temp[k++] = arr[i++];
while (j <= right) temp[k++] = arr[j++];
for (int i = 0; i < k; i++)
arr[left + i] = temp[i];
}
void merge_sort(int arr[], int left, int right) {
if (left >= right) return;
int mid = (left + right) / 2;
merge_sort(arr, left, mid);
merge_sort(arr, mid + 1, right);
merge(arr, left, mid, right);
}
int main() {
int nums[] = {38, 27, 43, 3, 9, 82, 10};
int n = sizeof(nums) / sizeof(nums[0]);
merge_sort(nums, 0, n - 1);
for (int i = 0; i < n; i++) printf("%d ", nums[i]);
printf("\n");
// 3 9 10 27 38 43 82
}
Divide and conquer
void merge_sort(int arr[], int left, int right) {
if (left >= right) return; // base: single element or empty
int mid = (left + right) / 2;
merge_sort(arr, left, mid); // sort left half
merge_sort(arr, mid + 1, right); // sort right half
merge(arr, left, mid, right); // combine
}
Three steps:
- Divide. Split at the middle.
- Conquer. Recursively sort each half.
- Combine. Merge the two sorted halves.
The base case is left >= right — a single element (or empty range) is trivially sorted.
For an array of 8: 8 → 4+4 → 2+2+2+2 → 1+1+1+1+1+1+1+1 → merge upward. log₂(8) = 3 levels of recursion.
merge — the key step
void merge(int arr[], int left, int mid, int right) {
int i = left, j = mid + 1, k = 0;
int temp[right - left + 1];
while (i <= mid && j <= right) {
if (arr[i] <= arr[j])
temp[k++] = arr[i++];
else
temp[k++] = arr[j++];
}
while (i <= mid) temp[k++] = arr[i++];
while (j <= right) temp[k++] = arr[j++];
for (int i = 0; i < k; i++)
arr[left + i] = temp[i];
}
Two sorted subranges (arr[left..mid] and arr[mid+1..right]) are merged into a temp buffer in sorted order, then copied back.
The main loop: compare the front of each subrange; take the smaller; advance that index.
The two follow-up loops: drain whichever range still has items.
The copy-back: put the result into place.
The <= (not <) preserves stability — when equal, we take from the left subrange first.
Walkthrough on [38, 27, 43, 3]
Split: [38, 27] and [43, 3].
Recursive sort on [38, 27]:
- Split: [38] and [27].
- Each is sorted (length 1).
- Merge: [27, 38].
Recursive sort on [43, 3]:
- Split: [43] and [3].
- Each is sorted.
- Merge: [3, 43].
Merge [27, 38] and [3, 43]:
- Compare 27, 3 → take 3.
- Compare 27, 43 → take 27.
- Compare 38, 43 → take 38.
- Drain right: take 43.
- Result: [3, 27, 38, 43].
Complexity
| Aspect | Bubble Sort | Merge Sort |
|---|---|---|
| Best | O(n) | O(n log n) |
| Average | O(n²) | O(n log n) |
| Worst | O(n²) | O(n log n) |
| Memory | O(1) | O(n) |
| Stable | Yes | Yes |
Merge sort wins on speed (O(n log n) vs O(n²)) — for 1,000,000 elements, ~20 million ops vs 500 billion. The cost is O(n) extra memory for the temp buffer.
VLA caveat
int temp[right - left + 1]; // Variable-Length Array
This is a C99 feature. Some compilers (MSVC) don't support VLAs. C11 made them optional. For portable code, use malloc:
int *temp = malloc((right - left + 1) * sizeof(int));
// ... use temp ...
free(temp);
Or allocate one big buffer at the top of merge_sort and pass it down — avoids malloc per recursion.
In-place mergesort exists
Standard mergesort uses O(n) extra memory. There are in-place variants (e.g., block merge) — much harder to implement and slower in practice. The O(n) memory is usually fine.
Why O(n log n)
Each level of recursion processes n elements (across all subarrays). There are log₂ n levels. Total: n × log n.
The merge step at each level is linear in the size of the subarrays it merges. Sum of all merges at level k: O(n).
This is the fundamental lower bound for comparison-based sorting — you can't do better than O(n log n) in the worst case (proven via decision-tree arguments).
When to use
- Linked lists. Mergesort works without random access. No O(n) extra memory needed (you reuse the list nodes).
- External sorts. Sort huge files that don't fit in memory by sorting chunks and merging.
- Stability matters.
qsortis not guaranteed stable;mergesortis. - Worst-case guarantees. For real-time systems, the no-bad-input property of mergesort matters.
For general-purpose sorting, quicksort (next episode) is usually faster in practice — better cache locality, no extra memory.
Mergesort on linked lists
typedef struct Node { int data; struct Node *next; } Node;
Node *merge_sort_list(Node *head) {
if (head == NULL || head->next == NULL) return head;
// split into halves
Node *slow = head, *fast = head, *prev = NULL;
while (fast != NULL && fast->next != NULL) {
prev = slow;
slow = slow->next;
fast = fast->next->next;
}
prev->next = NULL; // sever
Node *left = merge_sort_list(head);
Node *right = merge_sort_list(slow);
// merge ...
}
Linked-list mergesort doesn't need extra memory — splice nodes during merge.
Common mistakes
Off-by-one in mid. (left + right) / 2 for inclusive bounds; (left + right + 1) / 2 shifts the split. Be consistent.
Stable vs not. arr[i] <= arr[j] for stability; arr[i] < arr[j] is unstable.
Memory in temp. Forgetting that VLA isn't always available, or forgetting to free if using malloc.
Inclusive vs exclusive right. This implementation uses inclusive right (so the array length is right - left + 1). right = n - 1 for the full array.
Recursion depth. For 4-billion-element arrays, log₂ ≈ 32 stack frames — fine. For deeper recursion, iterative mergesort exists (bottom-up).
What's next
Episode 55: quicksort. Faster in practice, in-place, but worst-case O(n²) on bad pivots.
Recap
Mergesort: divide and conquer. Split at mid, recursively sort halves, merge. O(n log n) guaranteed — no bad inputs. Stable if you use <= in the merge comparison. Costs O(n) extra memory for the temp buffer. For linked lists, no extra memory needed. Use over bubble sort when n is large; choose between mergesort and quicksort based on stability needs and memory budget.
Next episode: quicksort.