Heap Sort
Description
Heap Sort is a comparison-based sorting algorithm that uses a binary heap data structure. It divides its input into a sorted and an unsorted region, and iteratively shrinks the unsorted region by extracting the largest element and moving it to the sorted region.
What is Heap Sort?
Heap Sort works by visualizing the elements of the array as a special kind of complete binary tree called a heap. The algorithm first builds a max heap from the input data, then repeatedly extracts the maximum element from the heap and reconstructs the heap until all elements are sorted.
Key Characteristics
- Comparison-Based: Elements are sorted by comparing them with each other
- In-Place: Requires only a constant amount of additional memory space
- Not Stable: Does not preserve the relative order of equal elements
- Non-Adaptive: Performance doesn't improve on partially sorted data
How Does It Differ from Other Sorting Algorithms?
Unlike Quick Sort, Heap Sort guarantees O(n log n) time complexity in the worst case. Unlike Merge Sort, it doesn't require additional memory space. The algorithm is particularly useful when you need consistent performance and have strict memory constraints.
History
Heap Sort was invented by J.W.J. Williams in 1964, marking a significant milestone in the history of sorting algorithms. The algorithm was first published in the paper "Algorithm 232 - Heapsort" in the Communications of the ACM. The development of Heap Sort was particularly notable because it was the first sorting algorithm to achieve optimal time complexity O(n log n) in the worst case while using only O(1) auxiliary space. This was a significant improvement over Quick Sort's worst-case time complexity of O(n²). The binary heap data structure, which is fundamental to Heap Sort, was also introduced in this same paper. Robert W. Floyd later improved the algorithm in 1964 with his "bottom-up" heap construction method, which reduced the complexity of building the initial heap from O(n log n) to O(n).
Understanding Heaps
A binary heap is a complete binary tree that forms the foundation of the Heap Sort algorithm. Understanding heaps is crucial to comprehending how Heap Sort achieves its efficiency.
Binary Heap Structure
A binary heap is a complete binary tree with two fundamental properties:
- Structure Property: A binary heap is a complete binary tree, which means all levels are fully filled except possibly the last level, which is filled from left to right. This property ensures the tree remains balanced.
- Heap Property: In a max-heap, for any given node I, the value of I is greater than or equal to the values of its children. The opposite is true for a min-heap, where parent nodes are smaller than their children.
Array Representation
Binary heaps are typically implemented using arrays, which provides efficient memory usage and access patterns. For an array-based implementation, given an index i:
Parent(i) = ⌊(i - 1) / 2⌋
LeftChild(i) = 2i + 1
RightChild(i) = 2i + 2
This mathematical relationship allows us to navigate the tree structure without storing explicit pointers, making heap operations very efficient.
Max-Heap vs Min-Heap
Max-Heap: The parent node is always greater than or equal to its children. The maximum element is always at the root. This is what Heap Sort uses for ascending order sorting.
Min-Heap: The parent node is always less than or equal to its children. The minimum element is always at the root. This can be used for descending order sorting.
How It Works
Heap Sort operates in two main phases: building the heap and extracting elements in sorted order.
Phase 1: Build Max Heap
The first step is to convert the input array into a max heap structure. This is done by calling the heapify operation on all non-leaf nodes, starting from the last non-leaf node and moving up to the root.
The last non-leaf node is at index ⌊n/2⌋ - 1, where n is the array length. We work backwards because heapifying assumes the children are already valid heaps.
Time Complexity: Although it appears to be O(n log n), building a heap from an unsorted array actually takes O(n) time due to the mathematical properties of the heap structure.
Phase 2: Heapify Operation
Heapify is the core operation that maintains the heap property. When called on a node, it:
- Compares the node with its left and right children
- Identifies the largest element among the three
- If the largest is not the current node, swaps them
- Recursively calls heapify on the affected child subtree
This process continues down the tree until the heap property is satisfied or a leaf node is reached.
Phase 3: Extract and Sort
Once the max heap is built, we repeatedly perform the following steps:
- Swap: Exchange the root (maximum element) with the last element in the unsorted portion of the array. This moves the maximum element to its correct sorted position.
- Reduce Heap Size: Decrease the heap size by 1, effectively removing the sorted element from consideration.
- Heapify Root: Call heapify on the root to restore the heap property, as the swap may have violated it.
- Repeat: Continue this process until the heap contains only one element.
After n-1 iterations, all elements will be in sorted order. The array is sorted in-place, with the maximum elements gradually accumulating at the end.
Step-by-Step Example
Consider sorting [4, 10, 3, 5, 1]:
- Build Max Heap: [10, 5, 3, 4, 1] - The array is rearranged into a max heap
- First Extraction: Swap 10 and 1, then heapify: [5, 4, 3, 1 | 10]
- Second Extraction: Swap 5 and 1, then heapify: [4, 1, 3 | 5, 10]
- Third Extraction: Swap 4 and 3, then heapify: [3, 1 | 4, 5, 10]
- Fourth Extraction: Swap 3 and 1: [1 | 3, 4, 5, 10]
- Result: [1, 3, 4, 5, 10] - Fully sorted array
Visualization
Click Start to begin visualization
Implementation
class HeapSort:
@staticmethod
def heapify(arr, n, i):
largest = i
left = 2 * i + 1
right = 2 * i + 2
# Compare with left child
if left < n and arr[left] > arr[largest]:
largest = left
# Compare with right child
if right < n and arr[right] > arr[largest]:
largest = right
# If largest is not root
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
HeapSort.heapify(arr, n, largest)
@staticmethod
def sort(arr):
n = len(arr)
# Build max heap
for i in range(n // 2 - 1, -1, -1):
HeapSort.heapify(arr, n, i)
# Extract elements from heap one by one
for i in range(n - 1, 0, -1):
arr[0], arr[i] = arr[i], arr[0]
HeapSort.heapify(arr, i, 0)
return arr
# Example usage
if __name__ == "__main__":
arr = [64, 34, 25, 12, 22, 11, 90]
print("Original array:", arr)
HeapSort.sort(arr)
print("Sorted array:", arr)
#include <iostream>
#include <vector>
class HeapSort {
private:
static void heapify(std::vector<int>& arr, int n, int i) {
int largest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;
if (left < n && arr[left] > arr[largest])
largest = left;
if (right < n && arr[right] > arr[largest])
largest = right;
if (largest != i) {
std::swap(arr[i], arr[largest]);
heapify(arr, n, largest);
}
}
public:
static void sort(std::vector<int>& arr) {
int n = arr.size();
// Build max heap
for (int i = n / 2 - 1; i >= 0; i--)
heapify(arr, n, i);
// Extract elements from heap one by one
for (int i = n - 1; i > 0; i--) {
std::swap(arr[0], arr[i]);
heapify(arr, i, 0);
}
}
};
// Example usage
int main() {
std::vector<int> arr = {64, 34, 25, 12, 22, 11, 90};
std::cout << "Original array: ";
for (int num : arr) std::cout << num << " ";
std::cout << std::endl;
HeapSort::sort(arr);
std::cout << "Sorted array: ";
for (int num : arr) std::cout << num << " ";
std::cout << std::endl;
return 0;
}
public class HeapSort
{
private static void Heapify(int[] arr, int n, int i)
{
int largest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;
if (left < n && arr[left] > arr[largest])
largest = left;
if (right < n && arr[right] > arr[largest])
largest = right;
if (largest != i)
{
(arr[i], arr[largest]) = (arr[largest], arr[i]);
Heapify(arr, n, largest);
}
}
public static void Sort(int[] arr)
{
int n = arr.Length;
// Build max heap
for (int i = n / 2 - 1; i >= 0; i--)
Heapify(arr, n, i);
// Extract elements from heap one by one
for (int i = n - 1; i > 0; i--)
{
(arr[0], arr[i]) = (arr[i], arr[0]);
Heapify(arr, i, 0);
}
}
// Example usage
public static void Main()
{
int[] arr = { 64, 34, 25, 12, 22, 11, 90 };
Console.WriteLine("Original array: " + string.Join(" ", arr));
Sort(arr);
Console.WriteLine("Sorted array: " + string.Join(" ", arr));
}
}
Complexity Analysis
| Operation | Time Complexity |
|---|---|
| Build Heap | O(n) |
| Heapify | O(log n) |
| Overall Sorting | O(n log n) |
Advantages and Disadvantages
Advantages
- Guaranteed O(n log n) time complexity
- In-place sorting algorithm
- No extra space required
- Great for partially sorted arrays
Disadvantages
- Not stable sort
- Poor cache performance
- Generally slower than quicksort
- Not adaptive