1. Introduction
In many business applications, you may have tables containing sequential values, such as:
- Transaction IDs that should be continuous (e.g., invoice numbers: 1001, 1002, 1003, …)
- Dates that should appear daily without missing days (e.g., sales date series)
- Employee sequence numbers, order numbers, or batch IDs
However, due to missing records, data corruption, or load errors, some IDs or dates may be missing. Detecting these gaps is a very common data quality check.
In Oracle, there are several ways to identify these gaps using SQL.
2. Example Setup
Let’s create a simple example table representing a list of transactions.
CREATE TABLE transactions (
txn_id NUMBER PRIMARY KEY,
txn_date DATE
);
INSERT INTO transactions VALUES (1, TO_DATE('01-JAN-2025','DD-MON-YYYY'));
INSERT INTO transactions VALUES (2, TO_DATE('02-JAN-2025','DD-MON-YYYY'));
INSERT INTO transactions VALUES (3, TO_DATE('03-JAN-2025','DD-MON-YYYY'));
INSERT INTO transactions VALUES (5, TO_DATE('05-JAN-2025','DD-MON-YYYY'));
INSERT INTO transactions VALUES (6, TO_DATE('06-JAN-2025','DD-MON-YYYY'));
INSERT INTO transactions VALUES (8, TO_DATE('08-JAN-2025','DD-MON-YYYY'));
COMMIT;
Here:
- We are missing IDs 4 and 7.
- We are missing dates 04-JAN-2025 and 07-JAN-2025.
3. Detecting Gaps in Sequential IDs
Let’s start with ID gaps.
Method 1: Using LAG() Analytic Function
Oracle’s analytic functions can compare each row with its previous row, making them perfect for gap detection.
SELECT
txn_id,
LAG(txn_id) OVER (ORDER BY txn_id) AS prev_txn,
txn_id - LAG(txn_id) OVER (ORDER BY txn_id) AS diff
FROM transactions;
Output:
| TXN_ID | PREV_TXN | DIFF |
|---|---|---|
| 1 | NULL | NULL |
| 2 | 1 | 1 |
| 3 | 2 | 1 |
| 5 | 3 | 2 |
| 6 | 5 | 1 |
| 8 | 6 | 2 |
Explanation:
- The
LAG()function retrieves the previous transaction ID. DIFF = txn_id - prev_txnshows the gap size between consecutive IDs.- Wherever
DIFF > 1, it indicates a missing sequence.
Now we can find only the missing gaps:
SELECT
LAG(txn_id) OVER (ORDER BY txn_id) + 1 AS missing_start,
txn_id - 1 AS missing_end
FROM transactions
WHERE txn_id - LAG(txn_id) OVER (ORDER BY txn_id) > 1;
Output:
| MISSING_START | MISSING_END |
|---|---|
| 4 | 4 |
| 7 | 7 |
Interpretation:
- Between 3 and 5 → Missing 4
- Between 6 and 8 → Missing 7
This is the most efficient and commonly used approach.
Method 2: Using Recursive CTE to Generate the Full Range
If you want to list all missing IDs explicitly, not just the ranges, use a recursive CTE.
WITH RECURSIVE all_ids (id) AS (
SELECT MIN(txn_id) FROM transactions
UNION ALL
SELECT id + 1 FROM all_ids
WHERE id < (SELECT MAX(txn_id) FROM transactions)
)
SELECT id AS missing_id
FROM all_ids
WHERE id NOT IN (SELECT txn_id FROM transactions);
Output:
| MISSING_ID |
|---|
| 4 |
| 7 |
Explanation:
- The recursive CTE generates a list of all numbers between the minimum and maximum ID.
- Then we subtract the IDs that actually exist.
- The remaining ones are the gaps.
Method 3: Using CONNECT BY (Non-Recursive)
Older versions of Oracle (before 11gR2) didn’t support recursive CTEs. You can use the CONNECT BY hierarchical query instead:
SELECT level + (SELECT MIN(txn_id) FROM transactions) - 1 AS id FROM dual CONNECT BY level <= (SELECT MAX(txn_id) - MIN(txn_id) + 1 FROM transactions) MINUS SELECT txn_id FROM transactions ORDER BY id;
Output:
| ID |
|---|
| 4 |
| 7 |
Explanation:
- The
CONNECT BYclause generates a continuous sequence using Oracle’s pseudo-columnLEVEL. - The
MINUSoperator filters out existing IDs, leaving only the missing ones.
4. Detecting Gaps in Sequential Dates
Now let’s look at missing dates — another common case, especially for time-series data like sales or attendance.
We’ll reuse the same data where some dates are missing.
Method 1: Generate Continuous Dates Using CONNECT BY
SELECT
start_date + LEVEL - 1 AS expected_date
FROM (
SELECT MIN(txn_date) AS start_date, MAX(txn_date) AS end_date FROM transactions
)
CONNECT BY LEVEL <= end_date - start_date + 1
MINUS
SELECT txn_date FROM transactions
ORDER BY expected_date;
Output:
| EXPECTED_DATE |
|---|
| 04-JAN-2025 |
| 07-JAN-2025 |
Explanation:
CONNECT BYgenerates all dates between the minimum and maximum.- We then subtract the dates that actually exist.
- The remaining rows are missing dates.
Method 2: Using Recursive CTE
If you prefer standard SQL instead of Oracle’s hierarchical syntax, use a recursive CTE:
WITH dates (d) AS (
SELECT MIN(txn_date) FROM transactions
UNION ALL
SELECT d + 1 FROM dates
WHERE d < (SELECT MAX(txn_date) FROM transactions)
)
SELECT d AS missing_date
FROM dates
WHERE d NOT IN (SELECT txn_date FROM transactions);
Output:
| MISSING_DATE |
|---|
| 04-JAN-2025 |
| 07-JAN-2025 |
Explanation:
- The recursive CTE builds every date in the range.
- We then exclude the ones already present.
This is clean, ANSI-standard SQL and works in modern Oracle versions (12c+).
Method 3: Using LAG() for Date Gaps
SELECT
txn_date,
LAG(txn_date) OVER (ORDER BY txn_date) AS prev_date,
txn_date - LAG(txn_date) OVER (ORDER BY txn_date) AS diff
FROM transactions;
Output:
| TXN_DATE | PREV_DATE | DIFF |
|---|---|---|
| 01-JAN-25 | NULL | NULL |
| 02-JAN-25 | 01-JAN-25 | 1 |
| 03-JAN-25 | 02-JAN-25 | 1 |
| 05-JAN-25 | 03-JAN-25 | 2 |
| 06-JAN-25 | 05-JAN-25 | 1 |
| 08-JAN-25 | 06-JAN-25 | 2 |
To display missing date ranges:
SELECT
prev_date + 1 AS missing_start,
txn_date - 1 AS missing_end
FROM (
SELECT
txn_date,
LAG(txn_date) OVER (ORDER BY txn_date) AS prev_date
FROM transactions
)
WHERE txn_date - prev_date > 1;
Output:
| MISSING_START | MISSING_END |
|---|---|
| 04-JAN-2025 | 04-JAN-2025 |
| 07-JAN-2025 | 07-JAN-2025 |
5. Detecting Gaps in Real Data Scenarios
Example: Missing Invoices
WITH invoices AS (
SELECT 1001 AS inv_no FROM dual UNION ALL
SELECT 1002 FROM dual UNION ALL
SELECT 1004 FROM dual UNION ALL
SELECT 1005 FROM dual
)
SELECT
prev_inv + 1 AS missing_start,
inv_no - 1 AS missing_end
FROM (
SELECT
inv_no,
LAG(inv_no) OVER (ORDER BY inv_no) AS prev_inv
FROM invoices
)
WHERE inv_no - prev_inv > 1;
Output:
| MISSING_START | MISSING_END |
|---|---|
| 1003 | 1003 |
6. Performance and Best Practices
- For large datasets, prefer
LAG()orLEAD()analytic queries over recursive CTEs—they are faster and more memory efficient. - Always index your sequential column (like ID or DATE) to improve join and filter performance.
- When using recursive CTE or
CONNECT BY, always limit recursion to avoid infinite loops. MINUSperforms a set-based subtraction and can be slower on large tables; preferNOT EXISTSif performance matters.- For date-based gap detection, store data in DATE or TIMESTAMP columns, not VARCHAR.
7. Summary Table
| Purpose | Best Approach | Example Function |
|---|---|---|
| Detect gaps in numeric IDs | LAG() analytic function | txn_id - LAG(txn_id) |
| Detect gaps in dates | LAG() or recursive CTE | txn_date - LAG(txn_date) |
| Generate complete sequence | CONNECT BY or recursive CTE | SELECT LEVEL |
| Show missing ranges | LAG() + WHERE diff > 1 | Yes |
| Show each missing value | Recursive or hierarchical query | Yes |
8. Conclusion
Detecting gaps in sequential data (IDs or dates) is crucial for ensuring data completeness and correctness.
Oracle offers multiple techniques to handle this:
- Analytic functions (
LAG,LEAD) for efficient difference detection - Recursive CTEs for generating and comparing sequences (standard SQL)
CONNECT BY LEVELfor fast number/date generation (Oracle-specific)
