1. What is Database Normalization?
Database normalization is the process of structuring a relational database to reduce redundancy and improve data integrity. This involves dividing large tables into smaller ones and defining relationships between them using foreign keys.
2. Types of Anomalies Solved by Normalization
Database normalization is essential to eliminate common types of anomalies that can arise in unstructured or poorly designed databases. These anomalies include:
Update Anomaly: Occurs when redundant data is updated inconsistently. For example, if an employee’s department name is stored in multiple records and one of them is updated without updating the others, it causes inconsistency.
Insert Anomaly: Happens when inserting data is problematic due to missing information. For example, you cannot insert a new course unless you also have a student enrolled in it because both are stored in the same table.
Delete Anomaly: Occurs when deleting one piece of data unintentionally removes other valuable data. For instance, if a student who is the only one enrolled in a course is deleted, and the course information is stored with the student, then the course data is also lost.
By applying normalization rules, such as splitting data into related tables and using keys to establish relationships, we prevent these anomalies and maintain data integrity.
3. Normal Forms Explained with Oracle Table Representations
3.1 First Normal Form (1NF) – Atomicity
Definition: A relation is in First Normal Form if all attributes (columns) contain only atomic (indivisible) values, and each record (row) is unique.
Explanation: In unnormalized tables, a single column may contain multiple values (e.g., a list of subjects). 1NF enforces that each column holds exactly one value, making querying and updates more straightforward and reliable. This is the foundational step in the normalization process, ensuring data is stored in a structured and retrievable way.
Example – Unnormalized Table Representation:
Table: Students +------------+----------+------------------+ | student_id | name | subjects | +------------+----------+------------------+ | 1 | Alice | Math, Physics | | 2 | Bob | Chemistry, Biology| +------------+----------+------------------+
Problems in Unnormalized Table:
- Difficult to query students enrolled in a specific subject
- Hard to maintain or update a single subject without affecting others
Normalized to 1NF:
Table: Students +------------+--------+ | student_id | name | +------------+--------+ | 1 | Alice | | 2 | Bob | +------------+--------+ Table: Student_Subjects +------------+------------+ | student_id | subject | +------------+------------+ | 1 | Math | | 1 | Physics | | 2 | Chemistry | | 2 | Biology | +------------+------------+
Benefits:
- Enables precise queries and reporting
- Avoids issues with partial updates
- Forms the basis for achieving higher normal forms
3.2 Second Normal Form (2NF) – Remove Partial Dependencies
Definition: A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the entire primary key (not just part of it).
Explanation: This form applies mainly to tables with composite primary keys. If a non-key column depends only on one part of the primary key, it violates 2NF. The goal is to eliminate this partial dependency by creating new related tables.
Not in 2NF:
Table: Course_Enrollments +------------+-----------+--------------+--------------+ | student_id | course_id | student_name | course_name | +------------+-----------+--------------+--------------+ | 1 | 101 | Alice | Mathematics | | 1 | 102 | Alice | Physics | +------------+-----------+--------------+--------------+
Problem Explanation: In the above table, the primary key is a combination of student_id
and course_id
. However, student_name
depends only on student_id
, and course_name
depends only on course_id
. This means there are partial dependencies, violating 2NF.
Issues Caused:
- Redundant data: The student’s name is repeated for each course enrollment.
- Update anomaly: If the student changes their name, multiple rows must be updated.
- Insert anomaly: You cannot store a student’s name without associating it with a course.
Normalized to 2NF:
Table: Students +------------+--------------+ | student_id | student_name | +------------+--------------+ | 1 | Alice | +------------+--------------+ Table: Courses +-----------+--------------+ | course_id | course_name | +-----------+--------------+ | 101 | Mathematics | | 102 | Physics | +-----------+--------------+ Table: Course_Enrollments +------------+-----------+ | student_id | course_id | +------------+-----------+ | 1 | 101 | | 1 | 102 | +------------+-----------+
3.3 Third Normal Form (3NF) – Remove Transitive Dependencies
Definition: A table is in 3NF if it is in 2NF and all the attributes are only dependent on the primary key (i.e., there are no transitive dependencies).
Explanation: A transitive dependency occurs when a non-key column depends on another non-key column. This can lead to redundancy and inconsistency. 3NF eliminates these issues by moving dependent attributes into a separate table.
Not in 3NF:
Table: Employees +--------+----------+---------------+------------------+ | emp_id | emp_name | department_id | department_name | +--------+----------+---------------+------------------+ | 1 | John | 10 | HR | | 2 | Jane | 20 | Finance | +--------+----------+---------------+------------------+
Problem Explanation: The primary key is emp_id
, and all other columns must depend only on it. However, department_name
is functionally dependent on department_id
, not on emp_id
directly. Since department_id
is not a primary key, this forms a transitive dependency.
Issues Caused:
- Redundant storage of department names
- Inconsistency if department name is updated in one row but not others
- Violation of normalization principles
Normalized to 3NF:
Table: Departments +---------------+------------------+ | department_id | department_name | +---------------+------------------+ | 10 | HR | | 20 | Finance | +---------------+------------------+ Table: Employees +--------+----------+---------------+ | emp_id | emp_name | department_id | +--------+----------+---------------+ | 1 | John | 10 | | 2 | Jane | 20 | +--------+----------+---------------+
3.4 Boyce-Codd Normal Form (BCNF) – Handling Overlapping Candidate Keys
Definition: A relation is in Boyce-Codd Normal Form (BCNF) if it is already in 3NF and every non-trivial functional dependency has a left-hand side that is a super key.
Key Concept:
BCNF removes situations where a non-candidate attribute (or a part of a composite key) determines a candidate key. In simpler terms, the column(s) determining others must be a candidate key themselves.
Example Scenario: College Course Scheduling
Table: Course_Schedule +----------+------------+----------+ | course | professor | room | +----------+------------+----------+ | DB101 | Dr. Rao | A1 | | OS201 | Dr. Meena | A2 | | DB101 | Dr. Rao | A1 | +----------+------------+----------+
Let’s assume the following functional dependencies:
- Each course is taught by one professor:
course → professor
- Each professor always uses the same room:
professor → room
The primary key is (course)
— since course uniquely identifies both professor and room.
But there’s a problem:
- Since
professor → room
, and professor is not a candidate key, this violates BCNF.
Problems with This Design
- If Dr. Rao changes rooms, you’d have to update it in many rows.
- You might end up with conflicting room numbers for the same professor.
BCNF Decomposition
We split the table into two:
Table: Course_Professor +----------+------------+ | course | professor | +----------+------------+ | DB101 | Dr. Rao | | OS201 | Dr. Meena | +----------+------------+ Table: Professor_Room +------------+----------+ | professor | room | +------------+----------+ | Dr. Rao | A1 | | Dr. Meena | A2 | +------------+----------+
Now:
- In
Course_Professor
, the course is the key. - In
Professor_Room
, professor is the key. - All functional dependencies are preserved and every determinant is a candidate key — satisfying BCNF.