Programming Intermediate 12 min

How to Design a Clean Database Schema for a New Feature

Most schema mistakes happen before a single line of code is written — when someone skips the design step and goes straight to migrations. A poorly designed schema creates technical debt that compounds with every feature added on top of it.

This guide walks through the full thought process: starting from the feature description, extracting entities, choosing the right key and column types, normalizing to 3NF, wiring foreign keys with the right deletion rules, and knowing when to stop normalizing and denormalize deliberately.

Step-by-step

  1. 1

    Extract Entities from the Feature Description

    Read the feature description and underline the nouns — those are your entities. Each entity becomes a table. For a "users can leave reviews on products" feature the nouns are users, reviews, and products. Ignore verbs and adjectives at this stage; focus only on the things that need to exist.

    A good sanity check: if you can count individual instances of it ("one review", "three products"), it is probably an entity.

    sql
    -- Feature: "Users can leave reviews on products"
    -- Entities found:
    --   users      → already exists
    --   products   → new table
    --   reviews    → new table (joins the two)
    
    -- Avoid: storing reviews as a JSON column on products.
    -- Each row in a table should represent exactly one thing.
  2. 2

    Choose Primary Keys Deliberately

    Three options, each with trade-offs:

    • Auto-increment integer — compact, fast joins, predictable. Exposes record count and allows enumeration attacks. Fine for internal data.
    • UUID (v4) — globally unique, safe to expose. Larger (16 bytes), random — causes index fragmentation on InnoDB. Use UUID_TO_BIN() to store as binary.
    • ULID — like UUID but lexicographically sortable. Best of both worlds for most new tables.

    Rule of thumb: if the ID will ever appear in a public URL, use ULID or UUID. If it is purely internal (junction tables, log entries), auto-increment is fine.

    sql
    -- Auto-increment (internal tables)
    CREATE TABLE order_items (
        id         BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
        order_id   BIGINT UNSIGNED NOT NULL,
        product_id BIGINT UNSIGNED NOT NULL
    );
    
    -- ULID as CHAR(26) (public-facing tables)
    CREATE TABLE products (
        id         CHAR(26)     NOT NULL PRIMARY KEY,  -- ULID
        name       VARCHAR(255) NOT NULL
    );
  3. 3

    Normalize to Third Normal Form

    Normalization removes redundancy. You need to reach 3NF before shipping any schema. Three rules to apply in order:

    • 1NF: Every column holds a single atomic value — no comma-separated lists, no JSON arrays of the same concept.
    • 2NF: Every non-key column depends on the whole primary key (only relevant for composite keys).
    • 3NF: Every non-key column depends only on the primary key — not on another non-key column.

    Classic 3NF violation: storing category_name next to category_id in the products table. If the category name changes you now have inconsistent data in every row.

    sql
    -- Bad: category_name duplicated across rows
    CREATE TABLE products (
        id            BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
        category_id   INT NOT NULL,
        category_name VARCHAR(100) NOT NULL  -- 3NF violation
    );
    
    -- Good: category lives in its own table
    CREATE TABLE categories (
        id   INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
        name VARCHAR(100) NOT NULL
    );
    
    CREATE TABLE products (
        id          BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
        category_id INT UNSIGNED NOT NULL,
        FOREIGN KEY (category_id) REFERENCES categories(id)
    );
  4. 4

    Set Foreign Keys and ON DELETE Rules

    Every foreign key needs an explicit ON DELETE rule. Picking the wrong one causes orphaned rows or accidental data loss:

    • CASCADE — delete the parent, delete the children too. Use when children cannot exist without the parent (e.g., order items belong to an order).
    • SET NULL — delete the parent, set the FK column to NULL. Use when the child can exist independently (e.g., a post whose author account was deleted).
    • RESTRICT (default) — block parent deletion if children exist. Use when you want to force the application to clean up first (safest default).
    sql
    CREATE TABLE reviews (
        id         BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
        user_id    BIGINT UNSIGNED NOT NULL,
        product_id BIGINT UNSIGNED NOT NULL,
        rating     TINYINT UNSIGNED NOT NULL CHECK (rating BETWEEN 1 AND 5),
        body       TEXT,
        created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    
        -- User deleted → delete their reviews
        FOREIGN KEY (user_id)    REFERENCES users(id)    ON DELETE CASCADE,
    
        -- Product deleted → delete its reviews
        FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE CASCADE
    );
  5. 5

    Pick Column Types Carefully

    Column types have real performance and correctness implications:

    • VARCHAR(n) — always set a realistic n. VARCHAR(255) is a lazy default; a slug is rarely more than 100 chars. InnoDB uses the declared length for sort buffers.
    • DECIMAL(p, s) vs FLOAT — never use FLOAT for money. FLOAT cannot represent 0.1 exactly. Use DECIMAL(10, 2) for currency.
    • TIMESTAMP vs DATETIMETIMESTAMP stores in UTC and converts to the session timezone automatically (good for created_at/updated_at). DATETIME stores as-is, timezone-naïve — use it when the value represents a calendar concept, not an instant.
    • TINYINT(1) for booleans — MySQL has no native BOOL type; TINYINT(1) is the convention.
    sql
    CREATE TABLE orders (
        id           BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
        user_id      BIGINT UNSIGNED NOT NULL,
        total_amount DECIMAL(10, 2)  NOT NULL,          -- not FLOAT
        currency     CHAR(3)         NOT NULL DEFAULT 'USD',
        status       ENUM('pending','paid','shipped','cancelled') NOT NULL DEFAULT 'pending',
        is_gift      TINYINT(1)      NOT NULL DEFAULT 0,
        scheduled_at DATETIME        NULL,              -- calendar date, no timezone conversion
        created_at   TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
        updated_at   TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
    );
  6. 6

    Establish Naming Conventions

    Inconsistent naming is the silent killer of schema readability. Pick a convention and enforce it across every table:

    • Table names: plural, snake_case — order_items, product_reviews
    • Column names: snake_case — first_name, created_at
    • Foreign keys: singular table name + _id suffix — user_id, product_id
    • Boolean columns: is_ or has_ prefix — is_active, has_verified_email
    • Timestamps: always created_at and updated_at — every table gets them
    sql
    -- Good naming
    CREATE TABLE blog_posts (
        id             BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
        author_id      BIGINT UNSIGNED NOT NULL,   -- FK: singular + _id
        category_id    INT UNSIGNED    NOT NULL,
        title          VARCHAR(255)    NOT NULL,
        slug           VARCHAR(100)    NOT NULL UNIQUE,
        is_published   TINYINT(1)      NOT NULL DEFAULT 0,
        published_at   TIMESTAMP       NULL,
        created_at     TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
        updated_at     TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
    );
  7. 7

    Denormalize Only When You Have Proof

    Denormalization is a deliberate trade: you introduce redundancy to avoid expensive JOINs. Do it only after you have a query that is actually slow with real data. Never pre-optimize.

    Common valid cases: storing a cached comment_count on the posts table (updated by a trigger or application event), or storing a serialized snapshot of an address on an orders row so it does not change if the user later updates their address.

    sql
    -- Caching a count to avoid COUNT(*) on large tables
    ALTER TABLE posts ADD COLUMN comment_count INT UNSIGNED NOT NULL DEFAULT 0;
    
    -- Update it when a comment is inserted (application layer)
    -- UPDATE posts SET comment_count = comment_count + 1 WHERE id = ?;
    
    -- Snapshotting an address so order history stays accurate
    ALTER TABLE orders ADD COLUMN shipping_address JSON NOT NULL;
    -- Stored at order creation time; never changes even if user edits their profile.
  8. 8

    Draw the Schema Before Writing Migrations

    Before writing a single migration, put the schema on paper or in dbdiagram.io. Drawing forces you to see relationships, spot missing tables, and catch naming conflicts before they are committed to version control.

    dbdiagram.io uses a simple DSL and exports to SQL. Share the diagram URL with teammates for review — it is far easier to refactor a diagram than to write corrective migrations after the fact.

    javascript
    // dbdiagram.io DSL example
    Table users {
      id         bigint [pk, increment]
      email      varchar(255) [unique, not null]
      created_at timestamp
    }
    
    Table products {
      id          bigint [pk, increment]
      category_id int    [ref: > categories.id]
      name        varchar(255)
      price       decimal(10,2)
    }
    
    Table reviews {
      id         bigint [pk, increment]
      user_id    bigint [ref: > users.id]
      product_id bigint [ref: > products.id]
      rating     tinyint
      body       text
    }

Tips & gotchas

  • Every table should have <code>created_at</code> and <code>updated_at</code> — no exceptions. You will always need to know when something happened.
  • Soft-delete patterns (<code>deleted_at</code> timestamp) are useful but add query complexity everywhere. Only use them if you genuinely need to recover deleted data.
  • Never store computed values in the database unless you have a proven performance reason. Compute them in the application layer.
  • If a column will only ever hold a fixed set of values, use <code>ENUM</code> or a lookup table — not a free-text VARCHAR. Constraints at the DB level are your last line of defense.
  • Index your foreign keys. MySQL/InnoDB does not create them automatically, and an un-indexed FK makes the parent's deletes and updates scan the entire child table.

Wrapping up

A clean schema is not about following rules for their own sake — it is about making the database do what it is good at: enforcing consistency, enabling fast queries, and surviving years of feature additions without a rewrite. Design on paper first, normalize until it hurts, then denormalize only where benchmarks justify it.

#Database #SQL #Design
Back to all guides

Need Help With Your Project?

Book a free 30-minute consultation to discuss your technical challenges and explore solutions together.