Data Modeling & ERDs

Normalization: 1NF, 2NF, 3NF

18 min Lesson 7 of 10

Normalization: 1NF, 2NF, 3NF

Redundancy is the silent enemy of a well-designed database. When the same fact is stored in multiple places, updates become risky, deletions destroy information unintentionally, and inserts fail until unrelated data is available. Normalization is the disciplined, step-by-step process of eliminating that redundancy by restructuring tables so that each fact lives in exactly one place.

There are many normal forms, but in practice the vast majority of business systems need only the first three: 1NF, 2NF, and 3NF. Together they remove three distinct categories of redundancy. We will walk through all three using a single running example drawn from an online bookstore.

The Starting Point: An Unnormalized Table

Imagine the bookstore captures each order in a spreadsheet-style table called OrderData:

OrderData --------------------------------------------------------------------------- OrderID | CustomerName | CustomerCity | BookISBN | BookTitle | AuthorNames | Qty | UnitPrice 1001 | Sara Khalid | Riyadh | 978-0-13-468599-1 | Clean Code | Robert Martin | 2 | 39.99 1001 | Sara Khalid | Riyadh | 978-0-13-235088-4 | The Clean Coder | Robert Martin | 1 | 34.99 1002 | Ali Hassan | Dubai | 978-0-13-468599-1 | Clean Code | Robert Martin | 3 | 39.99 1003 | Nour Salem | Cairo | 978-0-596-51774-8 | JavaScript | David Flanagan, O'Reilly| 1 | 49.99

Notice several problems: Sara Khalid / Riyadh appears twice (redundancy), the AuthorNames column holds multiple values in one cell (a violation we will fix first), and the book title depends only on the ISBN — not on which order it appears in. These are exactly the anomalies each normal form targets.

First Normal Form (1NF): Atomic Values, No Repeating Groups

Rule: Every cell must hold a single, indivisible (atomic) value. No arrays, no comma-separated lists, no repeating column groups (Author1, Author2, Author3 …). Each row must be uniquely identifiable by a primary key.

The AuthorNames column in row 4 holds two values — that violates 1NF. Fix it by splitting multi-valued data into separate rows, or into a related table. We also make the composite key explicit: (OrderID, BookISBN) uniquely identifies each line item.

OrderLine (after 1NF) --------------------------------------------------------------------------- OrderID | CustomerName | CustomerCity | BookISBN | BookTitle | AuthorID | AuthorName | Qty | UnitPrice 1001 | Sara Khalid | Riyadh | 978-0-13-468599-1 | Clean Code | A01 | Robert Martin | 2 | 39.99 1001 | Sara Khalid | Riyadh | 978-0-13-235088-4 | The Clean .. | A01 | Robert Martin | 1 | 34.99 1002 | Ali Hassan | Dubai | 978-0-13-468599-1 | Clean Code | A01 | Robert Martin | 3 | 39.99 1003 | Nour Salem | Cairo | 978-0-596-51774-8 | JavaScript | A02 | David Flanagan | 1 | 49.99 1003 | Nour Salem | Cairo | 978-0-596-51774-8 | JavaScript | A03 | O\'Reilly Press | 1 | 49.99
1NF Checklist: (1) No multi-valued cells. (2) No repeating column groups. (3) A primary key exists. That is all — 1NF does not care about redundancy between columns yet.

Second Normal Form (2NF): No Partial Dependencies

Rule: The table must already be in 1NF, and every non-key attribute must depend on the whole primary key — not just part of it. This rule only applies when the primary key is composite (made up of more than one column).

In our 1NF table the composite key is (OrderID, BookISBN, AuthorID). Look at BookTitle: it depends only on BookISBN, not on OrderID or AuthorID. That is a partial dependency. Similarly, CustomerName and CustomerCity depend only on OrderID. Move each partially dependent group into its own table:

Orders PK: OrderID OrderID | CustomerName | CustomerCity Books PK: BookISBN BookISBN | BookTitle | UnitPrice BookAuthors PK: (BookISBN, AuthorID) BookISBN | AuthorID | AuthorName OrderLines PK: (OrderID, BookISBN) OrderID | BookISBN | Qty

Now no non-key column "sneaks out" of its full key. The redundant repetition of Clean Code / 39.99 across rows 1001 and 1002 is gone — that fact lives once in Books.

Third Normal Form (3NF): No Transitive Dependencies

Rule: The table must already be in 2NF, and every non-key attribute must depend directly on the primary key — not on another non-key attribute. When attribute B determines attribute C, but B is itself a non-key column, that is a transitive dependency.

Look at the Orders table: if we add a CityRegion (e.g., "Central" for Riyadh, "Gulf" for Dubai), that column depends on CustomerCity, not directly on OrderID. The chain is: OrderID → CustomerCity → CityRegion. Move it out:

Cities PK: CityName CityName | CityRegion Orders (revised) PK: OrderID OrderID | CustomerName | CustomerCity ← CustomerCity is now a FK to Cities
Common mistake: Analysts sometimes stop at 2NF thinking the job is done. 3NF is equally important — transitive dependencies cause the same update anomalies as partial dependencies. If you change a city\'s region in one order row but forget another, you have inconsistent data.

The Full Normalization Journey — Diagram

The diagram below shows the three stages as separate tables, with arrows indicating which foreign keys link them. Read it as a summary of where each fact ends up after normalization.

Normalized tables after 1NF, 2NF, 3NF applied to the bookstore example Orders PK OrderID CustomerName FK CustomerCity OrderDate Cities PK CityName CityRegion Country OrderLines PK,FK OrderID PK,FK BookISBN Quantity LineTotal Books PK BookISBN BookTitle UnitPrice Genre BookAuthors PK,FK BookISBN PK AuthorID AuthorName AuthorRole Legend Foreign key reference PK = Primary Key FK = Foreign Key Italic attributes = optional/derived fields (shown for completeness) Each color = a different entity group (customers, line items, books, authorship)
The bookstore data model after full normalization to 3NF: five tables, each fact stored once.

Spotting Each Violation Quickly

Analysts use a simple mental checklist when reviewing a table design:

  • 1NF violation: Can you see a comma-separated list, a set of numbered columns (Phone1, Phone2 …), or a cell that clearly holds more than one piece of data? Fix it first.
  • 2NF violation: Is the primary key composite? If yes, check every non-key column: does it need all parts of the key to be meaningful, or only some? Move partial-dependent columns out.
  • 3NF violation: Can you draw a chain like Key → ColumnA → ColumnB? If ColumnB depends on ColumnA (a non-key), not directly on the key, move ColumnB out with ColumnA as its new key.
Boyce-Codd Normal Form (BCNF) is a stricter variant of 3NF that handles rare edge cases involving overlapping candidate keys. For the vast majority of business systems, 3NF is sufficient. Focus on 3NF in analysis; revisit BCNF only if an unusual key structure arises.

Update, Insertion, and Deletion Anomalies

Normalization is not an academic exercise — it prevents three real operational failures:

  1. Update anomaly: A book\'s price appears in 50 order rows. A price change must be applied 50 times; miss one and the data is inconsistent. After normalization, the price lives in Books once.
  2. Insertion anomaly: In the unnormalized table you cannot record a new book unless an order exists for it (because OrderID is part of the key). After normalization, you insert into Books independently.
  3. Deletion anomaly: If you delete the only order for a book, you lose the book record entirely. After normalization, deleting an order touches OrderLines only; Books is untouched.
Rule of thumb: "Every non-key attribute must describe the key, the whole key, and nothing but the key." This informal saying neatly summarizes 2NF (whole key) and 3NF (nothing but the key) together.

Summary

Normalization moves a table through three levels of quality. 1NF enforces atomic values and a clear primary key. 2NF removes partial dependencies so that every non-key column truly needs the entire composite key. 3NF removes transitive dependencies so that non-key columns depend directly on the key, not on each other. After these three steps, each business fact lives in exactly one place — making your data model robust, consistent, and easy to maintain.