Learnitweb

Timsort Algorithm

1. Introduction

Timsort is a hybrid sorting algorithm that combines the strengths of Merge Sort and Insertion Sort. It was invented in 2002 by Tim Peters for Python and was later adopted by Java (for object arrays) and other languages because of its real-world efficiency.

Unlike pure theoretical algorithms, Timsort is designed for real-world data, which is often partially ordered. It takes advantage of existing order (known as runs) to minimize comparisons and merges, making it faster than traditional O(n log n) algorithms on many practical datasets.


2. Key Idea (Intuition)

Real-world data isn’t completely random — it often contains sorted subsequences.
Timsort identifies these naturally occurring sorted sections called runs, and then merges them intelligently to obtain a fully sorted array.

Example:

Input: [2, 3, 4, 1, 5, 6, 7, 0]

Here, we already have partially sorted subsequences:

[2, 3, 4] and [1, 5, 6, 7]

Timsort will:

  1. Detect these “runs”.
  2. Sort each small run using Insertion Sort.
  3. Merge them together using an optimized Merge Sort strategy.

This reduces the total number of comparisons and data movements.


3. Characteristics of Timsort

FeatureDescription
Algorithm TypeHybrid (Merge Sort + Insertion Sort)
StableYes (keeps equal elements in same order)
AdaptiveYes (takes advantage of existing order)
In-PlacePartially (uses temporary arrays for merging)
Best CaseO(n) — if data is already sorted
Average CaseO(n log n)
Worst CaseO(n log n)

4. Step-by-Step Working of Timsort

Let’s break it into stages:

Step 1: Split the Array into Runs

A run is a contiguous sequence that is already sorted (either ascending or descending).

Timsort scans the array to find such runs.
If a run is descending, it reverses it to make it ascending.

Why?

  • Real-world data often contains ascending/descending sorted sections.
  • Sorting them again would be wasteful.

Step 2: Ensure Minimum Run Length

To optimize merging, Timsort ensures that each run has at least a minimum size (usually between 32 and 64).

If a run is shorter than this, Timsort extends it by sorting additional elements using Insertion Sort (efficient for small segments).

This “minimum run” size is calculated based on array length.

For array length n,

MIN_RUN = determineMinRun(n);

where MIN_RUN is chosen so that merging is balanced.

Step 3: Sort Individual Runs

Each run (of size ≤ MIN_RUN) is sorted using Insertion Sort, since Insertion Sort performs well for small datasets.

Step 4: Merge Runs

Once runs are sorted, Timsort merges them like Merge Sort, but intelligently.

It uses a stack to track runs and merges only when certain size relationships are satisfied (to ensure balanced merges).

Merge conditions ensure:

  • Merges are balanced in size.
  • The algorithm doesn’t degenerate to O(n²).

5. Determining Minimum Run Size

The minimum run size depends on the length of the array.
In Python’s implementation, it’s calculated as:

def min_run_length(n):
    r = 0
    while n >= 64:
        r |= n & 1
        n >>= 1
    return n + r

For an array of length 1,000,000,
the typical minimum run size is 32.


6. Visual Example

Consider:

Input: [5, 21, 7, 23, 19, 10, 11, 1, 3]

Step 1: Find runs
Runs found:

[5, 21] → ascending run
[7, 23] → ascending run
[19, 10, 11, 1, 3] → descending → reversed to ascending [3, 1, 11, 10, 19]

Step 2: Sort small runs
Each run is sorted using Insertion Sort.

Step 3: Merge runs
Merge pairs of runs progressively:

[5, 21] + [7, 23] → [5, 7, 21, 23]
[5, 7, 21, 23] + [3, 10, 11, 19] → [3, 5, 7, 10, 11, 19, 21, 23]

Final output:

[1, 3, 5, 7, 10, 11, 19, 21, 23]

7. Pseudocode for Timsort

Timsort(arr):
    n = length(arr)
    minRun = determineMinRun(n)
    
    # Step 1: Sort small runs using Insertion Sort
    for i in range(0, n, minRun):
        insertionSort(arr, i, min((i + minRun - 1), n - 1))
    
    # Step 2: Merge runs
    size = minRun
    while size < n:
        for left in range(0, n, 2 * size):
            mid = left + size - 1
            right = min((left + 2 * size - 1), (n - 1))
            if mid < right:
                merge(arr, left, mid, right)
        size = 2 * size

8. Java Implementation Example

Here’s a simplified Timsort implementation in Java (illustrative version):

public class TimSortExample {

    static int MIN_MERGE = 32;

    public static void timSort(int[] arr) {
        int n = arr.length;
        int minRun = minRunLength(MIN_MERGE);

        // Step 1: Sort small runs with insertion sort
        for (int start = 0; start < n; start += minRun) {
            int end = Math.min(start + minRun - 1, n - 1);
            insertionSort(arr, start, end);
        }

        // Step 2: Merge sorted runs
        for (int size = minRun; size < n; size = 2 * size) {
            for (int left = 0; left < n; left += 2 * size) {
                int mid = left + size - 1;
                int right = Math.min((left + 2 * size - 1), (n - 1));

                if (mid < right)
                    merge(arr, left, mid, right);
            }
        }
    }

    private static void insertionSort(int[] arr, int left, int right) {
        for (int i = left + 1; i <= right; i++) {
            int temp = arr[i];
            int j = i - 1;
            while (j >= left && arr[j] > temp) {
                arr[j + 1] = arr[j];
                j--;
            }
            arr[j + 1] = temp;
        }
    }

    private static void merge(int[] arr, int left, int mid, int right) {
        int len1 = mid - left + 1, len2 = right - mid;
        int[] leftArr = new int[len1];
        int[] rightArr = new int[len2];

        for (int x = 0; x < len1; x++) leftArr[x] = arr[left + x];
        for (int x = 0; x < len2; x++) rightArr[x] = arr[mid + 1 + x];

        int i = 0, j = 0, k = left;
        while (i < len1 && j < len2) {
            if (leftArr[i] <= rightArr[j]) arr[k++] = leftArr[i++];
            else arr[k++] = rightArr[j++];
        }
        while (i < len1) arr[k++] = leftArr[i++];
        while (j < len2) arr[k++] = rightArr[j++];
    }

    private static int minRunLength(int n) {
        int r = 0;
        while (n >= MIN_MERGE) {
            r |= n & 1;
            n >>= 1;
        }
        return n + r;
    }

    public static void main(String[] args) {
        int[] arr = {5, 21, 7, 23, 19, 10, 11, 1, 3};
        timSort(arr);
        for (int num : arr)
            System.out.print(num + " ");
    }
}

Output:

1 3 5 7 10 11 19 21 23

9. Time Complexity Analysis

CaseDescriptionTime Complexity
Best CaseAlready sorted arrayO(n)
Average CaseRandom dataO(n log n)
Worst CaseReversely sorted dataO(n log n)
Space ComplexityTemporary arrays for mergingO(n)

Why O(n) in the best case?
Because Timsort can detect existing runs and skip redundant work when the data is already sorted.


10. Advantages of Timsort

  1. Adaptive:
    Performs better when the input has partially sorted sections.
  2. Stable:
    Maintains relative order of equal elements.
  3. Real-world Efficiency:
    Excellent for complex data with mixed order.
  4. Hybrid Strategy:
    Combines the fast merging of Merge Sort with the simplicity of Insertion Sort.
  5. Widely Used:
    Default sorting algorithm in Python (sorted(), .sort()) and Java (Arrays.sort(Object[])).

11. Disadvantages of Timsort

  1. Memory Overhead:
    Uses extra space for merging (not fully in-place).
  2. Complex Implementation:
    More complicated than standard Merge or Quick sort.
  3. Not ideal for small arrays if they are completely unsorted and tiny.

12. Comparison with Other Algorithms

AlgorithmStableAdaptiveSpaceAverage TimeNotes
QuicksortNoNoO(log n)O(n log n)Fast but unstable
Merge SortYesNoO(n)O(n log n)Good stability
Insertion SortYesYesO(1)O(n²)Best for small arrays
TimsortYesYesO(n)O(n log n)Real-world efficient

13. Practical Use

  • Python:
    Default in list.sort() and sorted().
  • Java:
    Used in Arrays.sort(Object[]) since Java 7.
  • Kotlin:
    Uses Timsort under the hood for its sorted() function.

14. Summary

PropertyDescription
Algorithm TypeHybrid (Merge + Insertion Sort)
StableYes
AdaptiveYes
Best CaseO(n)
Worst CaseO(n log n)
Space ComplexityO(n)
Typical UseDefault sorting in Java and Python

15. Conclusion

Timsort is a real-world sorting algorithm that blends theoretical efficiency with practical adaptability. By combining Insertion Sort for small runs and Merge Sort for merging runs, it provides both speed and stability.
That’s why it has become the default sorting algorithm for high-level languages — balancing performance, stability, and adaptiveness.