Learnitweb

How to Delete Duplicate Records from a Table in Oracle SQL

Duplicate records are common in data due to data entry errors, improper joins, or integration issues. Oracle provides multiple SQL techniques to detect and remove duplicates efficiently.

1. What is a Duplicate Row?

In SQL, a row is considered duplicate if two or more rows have the same values in one or more columns, and only one of those rows needs to be retained.

For example:

ID | NAME   | AGE
---+--------+-----
1  | John   | 25
2  | John   | 25   -- duplicate of row above
3  | Alice  | 30

Here, rows 1 and 2 are duplicates based on NAME and AGE, even though their ID is different.

2. Sample Table and Data

Let’s create a sample table with duplicate records:

CREATE TABLE persons (
  id   NUMBER,
  name VARCHAR2(50),
  age  NUMBER
);

INSERT INTO persons VALUES (1, 'John', 25);
INSERT INTO persons VALUES (2, 'John', 25);
INSERT INTO persons VALUES (3, 'Alice', 30);
INSERT INTO persons VALUES (4, 'Bob', 35);
INSERT INTO persons VALUES (5, 'Alice', 30);
COMMIT;

3. Method 1: Delete Duplicates Using ROWID

ROWID is a unique identifier for each row in the database and is extremely useful for identifying duplicates.

Query:

DELETE FROM persons
WHERE ROWID NOT IN (
  SELECT MIN(ROWID)
  FROM persons
  GROUP BY name, age
);

How It Works:

  • GROUP BY name, age: Groups duplicate rows.
  • MIN(ROWID): Keeps the first occurrence.
  • NOT IN: Deletes all other rows except the one with the minimum ROWID.

Output:

After deletion, the table will contain only one record per (name, age) pair.

SELECT * FROM persons;

-- Output:
ID | NAME   | AGE
---+--------+-----
1  | John   | 25
3  | Alice  | 30
4  | Bob    | 35

4. Method 2: Using DELETE with Common Table Expression (CTE)

Oracle 12c+ supports CTEs with the DELETE statement.

WITH duplicates AS (
  SELECT name, age,
         ROWID AS rid,
         ROW_NUMBER() OVER (PARTITION BY name, age ORDER BY ROWID) AS rn
  FROM persons
)
DELETE FROM persons
WHERE ROWID IN (
  SELECT rid FROM duplicates WHERE rn > 1
);

Explanation:

  • ROW_NUMBER() assigns a sequence number to duplicates.
  • rn = 1: keeps the first occurrence.
  • rn > 1: identifies duplicates to delete.

Note: This method is cleaner and more readable, especially for large queries.

5. Method 3: Using RANK() or DENSE_RANK() in a Subquery

This works similarly to ROW_NUMBER() but with subtle differences if values are truly identical.

DELETE FROM persons
WHERE ROWID IN (
  SELECT rid FROM (
    SELECT ROWID AS rid,
           RANK() OVER (PARTITION BY name, age ORDER BY ROWID) AS rnk
    FROM persons
  )
  WHERE rnk > 1
);

6. Deleting Duplicates Based on a Subset of Columns

You may want to retain uniqueness only based on specific columns, e.g., name and age, while ignoring the id.

Just modify the GROUP BY or PARTITION BY clause accordingly.

7. Preserving a Specific Record (Using ID)

If you want to keep the record with the smallest or largest ID, modify the ORDER BY in ROW_NUMBER():

WITH duplicates AS (
  SELECT id, name, age,
         ROW_NUMBER() OVER (PARTITION BY name, age ORDER BY id) AS rn
  FROM persons
)
DELETE FROM persons
WHERE id IN (
  SELECT id FROM duplicates WHERE rn > 1
);

8. How to Find Duplicates Before Deleting

Preview duplicates before deleting them to be sure.

SELECT name, age, COUNT(*)
FROM persons
GROUP BY name, age
HAVING COUNT(*) > 1;

9. Create a Backup Before Deleting

Always keep a backup before destructive operations:

CREATE TABLE persons_backup AS SELECT * FROM persons;