๐๏ธ Database Normalization: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF | ๐ Exam Prep
Summary
Master database normalization, a key concept for data processing exams. This guide easily explains everything from 1NF to 5NF, functional dependencies, and denormalization with examples and practice questions.
๐ก Normalization is a frequent topic on data processing certification exams! Master it with this single post. A thorough understanding of each normal form, examples, and the final practice questions will bring you one step closer to passing.
๐ What is Database Normalization?
Normalization is the process of systematically decomposing tables to minimize data redundancy and ensure data integrity in a relational database. Normalization helps prevent anomalies and improves the stability and maintainability of the data structure.
- Anomalies: Problems that can occur due to data redundancy.
- Insertion Anomaly: The inability to insert desired data without some other unnecessary data.
- Deletion Anomaly: The loss of unintended data when certain information is deleted.
- Update Anomaly: Data inconsistency that occurs when only some of the redundant data is updated.
๐ Functional Dependency
To understand normalization, you must first grasp the concept of Functional Dependency.
In a relation, if the value of attribute A uniquely determines the value of attribute B, we say "attribute B is functionally dependent on attribute A," denoted as A โ B
. Here, A is the Determinant, and B is the Dependent.
- Full Functional Dependency: The dependent is determined by the entire primary key, not by any part of it. (Key to 2NF)
- Partial Functional Dependency: The dependent is determined by a part of the primary key.
- Transitive Functional Dependency: If
A โ B
andB โ C
, thenA โ C
holds. (Key to 3NF)
๐ข Summary of Normalization Stages
Normal Form | Core Condition | Target for Elimination |
---|---|---|
1NF (First Normal Form) | All attribute values must be atomic. | Multi-valued attributes |
2NF (Second Normal Form) | Must be in 1NF and eliminate partial functional dependencies. (Full functional dependency) | Partial functional dependencies |
3NF (Third Normal Form) | Must be in 2NF and eliminate transitive functional dependencies. | Transitive functional dependencies |
BCNF (Boyce-Codd NF) | Must be in 3NF and every determinant must be a candidate key. | Determinants that are not candidate keys |
4NF (Fourth Normal Form) | Must be in BCNF and eliminate multi-valued dependencies. | Multi-valued dependencies |
5NF (Fifth Normal Form) | Must be in 4NF and uses join dependencies. | Join dependencies |
1๏ธโฃ First Normal Form (1NF)
First Normal Form requires that all attribute values in a relation be atomic. This means a single attribute cannot hold multiple values.
- Example: If you store the courses a student is taking, putting multiple values like 'Data Structures, Algorithms' in a
Courses
attribute violates 1NF.
[Before Normalization]
StudentID | Name | Courses |
---|---|---|
1001 | John Doe | Data Structures, Algorithms |
1002 | Jane Smith | Operating Systems |
[After 1NF Normalization]
StudentID | Name | Course |
---|---|---|
1001 | John Doe | Data Structures |
1001 | John Doe | Algorithms |
1002 | Jane Smith | Operating Systems |
2๏ธโฃ Second Normal Form (2NF)
Second Normal Form requires that the relation is in 1NF and all partial functional dependencies are removed. This means all non-key attributes must be fully functionally dependent on the entire primary key. (If the primary key is a single attribute, the relation is always in 2NF).
- Example: In a course registration relation with
(StudentID, CourseID)
as the primary key,Grade
depends on both(StudentID, CourseID)
, butProfessorName
depends only onCourseID
(a partial functional dependency).
[Before Normalization] (Primary Key: StudentID, CourseID)
StudentID | CourseID | ProfessorName | Grade |
---|---|---|---|
1001 | CS101 | Prof. Kim | A+ |
1001 | CS202 | Prof. Lee | A0 |
1002 | CS101 | Prof. Kim | B+ |
[After 2NF Normalization]
[Registration Table]
StudentID | CourseID | Grade |
---|---|---|
1001 | CS101 | A+ |
1001 | CS202 | A0 |
1002 | CS101 | B+ |
[Course Table]
CourseID | ProfessorName |
---|---|
CS101 | Prof. Kim |
CS202 | Prof. Lee |
3๏ธโฃ Third Normal Form (3NF)
Third Normal Form requires that the relation is in 2NF and all transitive functional dependencies are removed. A transitive dependency exists when A โ B
and B โ C
, which leads to A โ C
.
- Example: If
StudentID โ DepartmentID
andDepartmentID โ DepartmentName
, then there is a transitive dependencyStudentID โ DepartmentName
.
[Before Normalization] (Primary Key: StudentID)
StudentID | Name | DepartmentID | DepartmentName |
---|---|---|---|
1001 | John Doe | CS | Computer Science |
1002 | Jane Smith | EE | Electrical Engineering |
1003 | Peter Pan | CS | Computer Science |
[After 3NF Normalization]
[Student Table]
StudentID | Name | DepartmentID |
---|---|---|
1001 | John Doe | CS |
1002 | Jane Smith | EE |
1003 | Peter Pan | CS |
[Department Table]
DepartmentID | DepartmentName |
---|---|
CS | Computer Science |
EE | Electrical Engineering |
BCNF (Boyce-Codd Normal Form)
BCNF is a stricter version of 3NF. It requires that for every functional dependency, the determinant must be a candidate key. It addresses certain rare anomalies not handled by 3NF.
- Example: In a relation where
(StudentID, Course)
is a candidate key, if there's a constraint that a professor can only teach one course (Professor โ Course
), thenProfessor
is a determinant but not a candidate key. This violates BCNF.
[Before Normalization] (Candidate Key: (StudentID, Course))
StudentID | Course | Professor |
---|---|---|
1001 | Data Structures | Prof. Kim |
1002 | Data Structures | Prof. Kim |
1001 | Algorithms | Prof. Lee |
[After BCNF Normalization]
[Enrollment Table]
StudentID | Professor |
---|---|
1001 | Prof. Kim |
1002 | Prof. Kim |
1001 | Prof. Lee |
[ProfessorCourse Table]
Professor | Course |
---|---|
Prof. Kim | Data Structures |
Prof. Lee | Algorithms |
4๏ธโฃ Fourth Normal Form (4NF) & 5๏ธโฃ Fifth Normal Form (5NF)
4NF and 5NF are rarely encountered in practice, but it's good to know the concepts.
- Fourth Normal Form (4NF): Eliminates multi-valued dependencies (MVD). This dependency arises in 1:N relationships.
- Fifth Normal Form (5NF): Deals with join dependencies (JD). It requires that every join dependency is implied by the candidate keys.
๐ Denormalization
Denormalization is the process of intentionally violating normalization rules by combining tables or allowing redundancy to improve database performance. It is used to avoid costly joins, but it must be applied cautiously as it can risk data inconsistency.
- Common Techniques: Table merging, table splitting (horizontal/vertical), adding redundant columns.
๐ Exam Prep Practice Questions
Problem | Which normal form ensures that all attribute values in a relation are atomic? |
Your Answer | |
Correct Answer | Reveal Answer |
Problem | Which normal form aims to eliminate partial functional dependencies, ensuring all attributes are fully dependent on the primary key? |
Your Answer | |
Correct Answer | Reveal Answer |
Problem | What is the normal form that focuses on removing transitive functional dependencies? |
Your Answer | |
Correct Answer | Reveal Answer |
Problem | Which normal form strengthens 3NF by requiring that every determinant in the relation must be a candidate key? |
Your Answer | |
Correct Answer | Reveal Answer |
Problem | What is the process of decomposing table structures to minimize data redundancy and ensure integrity called? |
Your Answer | |
Correct Answer | Reveal Answer |
Problem | What is the practice of intentionally adding redundancy or merging tables in a normalized database to improve system performance called? |
Your Answer | |
Correct Answer | Reveal Answer |