Normalization in DBMS:
Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like insertion, update, and deletion anomalies. It is a multi-step process that puts data into tabular form, removing duplicated data from the relation tables. Normalization helps improve the structure of the database by ensuring that dependencies are properly enforced and data anomalies are minimized.
The process of normalization involves applying various normal forms (NF), each with specific criteria. The most commonly used normal forms are 1NF (First Normal Form), 2NF (Second Normal Form), 3NF (Third Normal Form), and BCNF (Boyce-Codd Normal Form).
Why is Normalization Important?
- Reduces Redundancy: Helps in eliminating duplicate data by organizing data into separate related tables.
- Ensures Data Integrity: Minimizes data anomalies (insertion, deletion, and update anomalies).
- Efficient Storage: Saves storage space by ensuring data is not duplicated unnecessarily.
Let's dive into the different normal forms and understand how they improve database structure with examples.
1. First Normal Form (1NF)
A relation is in 1NF if it contains only atomic (indivisible) values. It disallows multi-valued attributes, meaning each column should hold only one value per record.
Example of a Table NOT in 1NF:
EMP_ID
|
EMP_NAME
|
EMP_PHONE
|
EMP_STATE
|
14
|
John
|
7272826385, 9064738238
|
UP
|
20
|
Harry
|
8574783832
|
Bihar
|
12
|
Sam
|
7390372389, 8589830302
|
Punjab
|
|
The above table contains a multi-valued attribute EMP_PHONE
, which violates 1NF rules.
Table in 1NF:
EMP_ID
|
EMP_NAME
|
EMP_PHONE
|
EMP_STATE
|
14
|
John
|
7272826385
|
UP
|
14
|
John
|
9064738238
|
UP
|
20
|
Harry
|
8574783832
|
Bihar
|
12
|
Sam
|
7390372389
|
Punjab
|
12
|
Sam
|
8589830302
|
Punjab
|
|
By decomposing the multi-valued EMP_PHONE
field into atomic values, the table now adheres to 1NF.
2. Second Normal Form (2NF)
A relation is in 2NF if:
- It is in 1NF.
- All non-key attributes are fully functionally dependent on the primary key. This means there should be no partial dependency (i.e., a situation where a non-key attribute is dependent on only part of a composite primary key).
Example of a Table NOT in 2NF:
TEACHER_ID
|
SUBJECT
|
TEACHER_AGE
|
25
|
Chemistry
|
30
|
25
|
Biology
|
30
|
47
|
English
|
35
|
83
|
Math
|
38
|
83
|
Computer
|
38
|
|
In the above table, TEACHER_AGE
is partially dependent on TEACHER_ID
, which violates 2NF.
Decomposed Tables in 2NF:
- TEACHER_DETAIL Table:
TEACHER_ID
|
TEACHER_AGE
|
25
|
30
|
47
|
35
|
83
|
38
|
|
- TEACHER_SUBJECT Table:
TEACHER_ID
|
SUBJECT
|
25
|
Chemistry
|
25
|
Biology
|
47
|
English
|
83
|
Math
|
83
|
Computer
|
|
Now, TEACHER_AGE
is not dependent on the subject, and the subject and teacher ID have been separated into their own table.
3. Third Normal Form (3NF)
A relation is in 3NF if:
- It is in 2NF.
- It does not have any transitive dependency (i.e., non-prime attributes are not dependent on other non-prime attributes).
Example of a Table NOT in 3NF:
EMP_ID
|
EMP_NAME
|
EMP_ZIP
|
EMP_STATE
|
EMP_CITY
|
222
|
Harry
|
201010
|
UP
|
Noida
|
333
|
Stephan
|
02228
|
US
|
Boston
|
444
|
Lan
|
60007
|
US
|
Chicago
|
555
|
Katharine
|
06389
|
UK
|
Norwich
|
666
|
John
|
462007
|
MP
|
Bhopal
|
|
In this table, the EMP_CITY
and EMP_STATE
are transitively dependent on EMP_ZIP
, which violates 3NF.
Decomposed Tables in 3NF:
- EMPLOYEE Table:
EMP_ID
|
EMP_NAME
|
EMP_ZIP
|
222
|
Harry
|
201010
|
333
|
Stephan
|
02228
|
444
|
Lan
|
60007
|
555
|
Katharine
|
06389
|
666
|
John
|
462007
|
|
- EMPLOYEE_ZIP Table:
EMP_ZIP
|
EMP_STATE
|
EMP_CITY
|
201010
|
UP
|
Noida
|
02228
|
US
|
Boston
|
60007
|
US
|
Chicago
|
06389
|
UK
|
Norwich
|
462007
|
MP
|
Bhopal
|
|
The transitive dependency has been removed, ensuring the table adheres to 3NF.
4. Boyce-Codd Normal Form (BCNF)
A relation is in BCNF if:
- It is in 3NF.
- For every functional dependency, the left-hand side must be a superkey.
Example of a Table NOT in BCNF:
EMP_ID
|
EMP_COUNTRY
|
EMP_DEPT
|
DEPT_TYPE
|
EMP_DEPT_NO
|
264
|
India
|
Designing
|
D394
|
283
|
264
|
India
|
Testing
|
D394
|
300
|
364
|
UK
|
Stores
|
D283
|
232
|
364
|
UK
|
Developing
|
D283
|
549
|
|
In the above table, EMP_DEPT
and DEPT_TYPE
form a partial key, which violates BCNF.
Decomposed Tables in BCNF:
- EMP_COUNTRY Table:
EMP_ID
|
EMP_COUNTRY
|
264
|
India
|
364
|
UK
|
|
- EMP_DEPT Table:
EMP_DEPT
|
DEPT_TYPE
|
EMP_DEPT_NO
|
Designing
|
D394
|
283
|
Testing
|
D394
|
300
|
Stores
|
D283
|
232
|
Developing
|
D283
|
549
|
|
- EMP_DEPT_MAPPING Table:
EMP_ID
|
EMP_DEPT
|
264
|
Designing
|
264
|
Testing
|
364
|
Stores
|
364
|
Developing
|
|
Now, the partial dependency is removed, making the tables comply with BCNF.
Conclusion
Normalization is a vital aspect of database design that ensures the efficient organization of data, reducing redundancy and ensuring data integrity. Understanding and applying the different normal forms—1NF, 2NF, 3NF, and BCNF—helps to optimize database structure, making the system easier to maintain and more efficient to query.
By progressively applying normalization techniques, we can achieve a well-structured, consistent, and reliable database system.