Learnitweb

Joins and Unions in ClickHouse

This tutorial explains how Joins and Unions work in ClickHouse, how they combine data from multiple tables, and how to use them correctly in real scenarios. The goal is not only to show syntax, but to build a clear mental model of how ClickHouse processes relational queries.

We will move step-by-step from concepts to syntax to working examples.

1. Understanding Joins in ClickHouse

A join is a SQL operation that combines rows from two or more tables based on a related column between them. This related column is often called the join key and represents a logical relationship between datasets.

Joins are essential whenever your data is normalized across multiple tables and you want to query it as a single logical dataset. For example, employee details may be in one table while salary records are in another, and joins allow you to combine them.

1.1 Types of Joins Supported in ClickHouse

ClickHouse supports the following join types:

  • INNER JOIN
  • LEFT JOIN
  • RIGHT JOIN
  • FULL JOIN
  • CROSS JOIN

Each join type determines how matching and non-matching rows are handled, and selecting the correct type is critical for accurate results.

2. General Join Syntax in ClickHouse

The basic syntax looks like this:

SELECT column_list
FROM left_table
JOIN_TYPE right_table
ON join_condition;

The table before the JOIN clause is called the left table, and the table after the JOIN clause is called the right table. This distinction matters because some join types treat left and right tables differently.

The ON clause defines the relationship and contains one or more expressions specifying how rows should match. Without a correct ON condition, the join will not produce meaningful results.

3. Practical Setup

Let us create two tables:

3.1 Employee Table

CREATE TABLE employee
(
    Name String,
    City String
)
ENGINE = MergeTree
ORDER BY Name;

3.2 Salary Table

CREATE TABLE salary
(
    Salary Int32,
    Month String,
    Name String
)
ENGINE = MergeTree
ORDER BY Name;

Notice that the column Name is present in both tables and acts as the join key. This shared column is what allows us to relate records.

3.3 Insert Sample Data

INSERT INTO employee VALUES
('A', 'New York'),
('B', 'London'),
('C', 'Paris');
INSERT INTO salary VALUES
(5000, 'Jan', 'A'),
(6000, 'Jan', 'B'),
(7000, 'Jan', 'D');

Observe that some names exist in both tables while others do not, which helps demonstrate how each join behaves.

4. INNER JOIN

An INNER JOIN returns only rows that have matching values in both tables. Any non-matching rows are excluded entirely.

SELECT *
FROM employee
INNER JOIN salary
ON employee.Name = salary.Name;

Result Explanation

  • Only A and B appear
  • C is excluded (no salary match)
  • D is excluded (no employee match)

This join is best used when you only care about records that exist in both datasets.

5. LEFT JOIN

A LEFT JOIN returns all rows from the left table and matching rows from the right table. When no match exists, right-table columns are filled with NULLs.

SELECT *
FROM employee
LEFT JOIN salary
ON employee.Name = salary.Name;

Result Explanation

  • A and B have full data
  • C appears with NULL salary values

This join is ideal when the left table is your primary dataset and missing matches are acceptable.

6. RIGHT JOIN

A RIGHT JOIN returns all rows from the right table and matching rows from the left table. When no match exists, left-table columns are NULL.

SELECT *
FROM employee
RIGHT JOIN salary
ON employee.Name = salary.Name;

Result Explanation

  • A and B match normally
  • D appears with NULL City

This join is useful when the right table represents the primary dataset.

7. FULL JOIN

A FULL JOIN returns all rows from both tables, filling NULLs where matches do not exist. It is the most inclusive join type.

SELECT *
FROM employee
FULL JOIN salary
ON employee.Name = salary.Name;

Result Explanation

  • A and B match
  • C appears with NULL salary
  • D appears with NULL city

This join is powerful when you want to see all possible records regardless of matches.

8. CROSS JOIN

A CROSS JOIN produces the Cartesian product of two tables, meaning every row from the first table is paired with every row from the second.

SELECT *
FROM employee
CROSS JOIN salary;

If employee has 3 rows and salary has 3 rows, the result will have 9 rows. This join should be used carefully because it can grow very large.

9. Understanding Unions in ClickHouse

A UNION combines the result sets of multiple SELECT statements into a single result set. Unlike joins, unions stack results vertically instead of merging columns horizontally.

9.1 UNION ALL Syntax

SELECT column_list FROM table1
UNION ALL
SELECT column_list FROM table2;

The number of columns must match and their data types must be compatible in corresponding positions. Otherwise, ClickHouse will raise an error.

9.2 Example

SELECT Name FROM employee
UNION ALL
SELECT Name FROM salary;

This returns all names from both tables, including duplicates.

10. Getting Distinct Values

UNION ALL does not remove duplicates by default, so DISTINCT must be used if uniqueness is required.

SELECT DISTINCT Name FROM
(
    SELECT Name FROM employee
    UNION ALL
    SELECT Name FROM salary
);

This ensures each name appears only once in the final result.