Learnitweb

Category: Data structures and algorithms

  • Timsort Algorithm

    1. Introduction Timsort is a hybrid sorting algorithm that combines the strengths of Merge Sort and Insertion Sort. It was invented in 2002 by Tim Peters for Python and was later adopted by Java (for object arrays) and other languages because of its real-world efficiency. Unlike pure theoretical algorithms, Timsort is designed for real-world data,…

  • How to sort an array with million records?

    1. Understanding the Problem You have: Since this array easily fits in memory (a million integers take only ~4 MB), you don’t need any external or distributed sorting approach.All sorting can be done in-memory — which allows for O(n log n) performance and sub-second execution on modern CPUs. 2. Java Sorting Algorithms Overview Java provides…

  • Dual-Pivot Quicksort

    1. Introduction Quicksort is one of the fastest and most widely used sorting algorithms due to its efficiency and in-place sorting behavior. Traditionally, Quicksort uses one pivot to partition an array into two parts: However, in Dual-Pivot Quicksort, the idea is extended to use two pivots instead of one. This allows partitioning the array into…

  • MinHash

    Introduction to MinHash MinHash, short for “Min-wise Independent Permutations Hashing”, is a probabilistic algorithm used to estimate the similarity between two sets. Unlike exact similarity measures, which require comparing every element in both sets, MinHash provides a fast and memory-efficient way to approximate similarity, making it extremely useful for large-scale datasets. MinHash is primarily used…

  • Cuckoo Filter

    Introduction to Cuckoo Filter A Cuckoo Filter is a probabilistic data structure used to efficiently check whether an element exists in a set. It is similar to a Bloom Filter, but provides additional benefits like support for deletions and better space efficiency in many cases. Cuckoo Filters are part of the approximate membership query (AMQ)…

  • Count-Min Sketch

    Introduction to Count-Min Sketch Count-Min Sketch (CMS) is a probabilistic data structure used for frequency estimation of elements in a data stream. It allows you to determine how many times an element has appeared in a large dataset or streaming data without storing every element explicitly. Count-Min Sketch is part of the streaming algorithms family,…

  • HyperLogLog

    Introduction to HyperLogLog HyperLogLog (HLL) is a probabilistic algorithm used to efficiently estimate the number of unique elements (cardinality) in a very large dataset. Unlike traditional counting methods, which require storing every single element to ensure accurate counting, HyperLogLog achieves this using a fixed and very small amount of memory, making it extremely suitable for…

  • Raft and Paxos

    Introduction to Consensus Algorithms In a distributed system, multiple nodes (servers) often need to agree on a single value or system state, even when some nodes fail, messages are delayed, or nodes crash. This is called the consensus problem, and solving it correctly is critical for ensuring fault-tolerant and consistent systems. Consensus is fundamental in…

  • Merkle Tree

    Introduction to Merkle Tree A Merkle Tree, also known as a hash tree, is a fundamental data structure used for verifying data integrity and consistency in distributed and decentralized systems. It is named after Ralph Merkle, who introduced the concept in 1979. Unlike standard data structures that store complete information, a Merkle Tree stores only…

  • Bloom Filter

    Introduction to Bloom Filter A Bloom Filter is a probabilistic data structure used to test whether an element is possibly in a set or definitely not in a set. Unlike traditional data structures such as arrays, lists, or hash sets, Bloom Filters do not store the actual elements. Instead, they use a bit array and…