Duplicate records are common in data due to data entry errors, improper joins, or integration issues. Oracle provides multiple SQL techniques to detect and remove duplicates efficiently.
1. What is a Duplicate Row?
In SQL, a row is considered duplicate if two or more rows have the same values in one or more columns, and only one of those rows needs to be retained.
For example:
ID | NAME | AGE ---+--------+----- 1 | John | 25 2 | John | 25 -- duplicate of row above 3 | Alice | 30
Here, rows 1 and 2 are duplicates based on NAME
and AGE
, even though their ID
is different.
2. Sample Table and Data
Let’s create a sample table with duplicate records:
CREATE TABLE persons ( id NUMBER, name VARCHAR2(50), age NUMBER ); INSERT INTO persons VALUES (1, 'John', 25); INSERT INTO persons VALUES (2, 'John', 25); INSERT INTO persons VALUES (3, 'Alice', 30); INSERT INTO persons VALUES (4, 'Bob', 35); INSERT INTO persons VALUES (5, 'Alice', 30); COMMIT;
3. Method 1: Delete Duplicates Using ROWID
ROWID
is a unique identifier for each row in the database and is extremely useful for identifying duplicates.
Query:
DELETE FROM persons WHERE ROWID NOT IN ( SELECT MIN(ROWID) FROM persons GROUP BY name, age );
How It Works:
GROUP BY name, age
: Groups duplicate rows.MIN(ROWID)
: Keeps the first occurrence.NOT IN
: Deletes all other rows except the one with the minimumROWID
.
Output:
After deletion, the table will contain only one record per (name, age)
pair.
SELECT * FROM persons; -- Output: ID | NAME | AGE ---+--------+----- 1 | John | 25 3 | Alice | 30 4 | Bob | 35
4. Method 2: Using DELETE
with Common Table Expression (CTE)
Oracle 12c+ supports CTEs with the DELETE
statement.
WITH duplicates AS ( SELECT name, age, ROWID AS rid, ROW_NUMBER() OVER (PARTITION BY name, age ORDER BY ROWID) AS rn FROM persons ) DELETE FROM persons WHERE ROWID IN ( SELECT rid FROM duplicates WHERE rn > 1 );
Explanation:
ROW_NUMBER()
assigns a sequence number to duplicates.rn = 1
: keeps the first occurrence.rn > 1
: identifies duplicates to delete.
Note: This method is cleaner and more readable, especially for large queries.
5. Method 3: Using RANK()
or DENSE_RANK()
in a Subquery
This works similarly to ROW_NUMBER()
but with subtle differences if values are truly identical.
DELETE FROM persons WHERE ROWID IN ( SELECT rid FROM ( SELECT ROWID AS rid, RANK() OVER (PARTITION BY name, age ORDER BY ROWID) AS rnk FROM persons ) WHERE rnk > 1 );
6. Deleting Duplicates Based on a Subset of Columns
You may want to retain uniqueness only based on specific columns, e.g., name
and age
, while ignoring the id
.
Just modify the GROUP BY
or PARTITION BY
clause accordingly.
7. Preserving a Specific Record (Using ID)
If you want to keep the record with the smallest or largest ID, modify the ORDER BY
in ROW_NUMBER()
:
WITH duplicates AS ( SELECT id, name, age, ROW_NUMBER() OVER (PARTITION BY name, age ORDER BY id) AS rn FROM persons ) DELETE FROM persons WHERE id IN ( SELECT id FROM duplicates WHERE rn > 1 );
8. How to Find Duplicates Before Deleting
Preview duplicates before deleting them to be sure.
SELECT name, age, COUNT(*) FROM persons GROUP BY name, age HAVING COUNT(*) > 1;
9. Create a Backup Before Deleting
Always keep a backup before destructive operations:
CREATE TABLE persons_backup AS SELECT * FROM persons;