Step-by-step

1
Extract Entities from the Feature Description

Read the feature description and underline the nouns — those are your entities. Each entity becomes a table. For a "users can leave reviews on products" feature the nouns are users, reviews, and products. Ignore verbs and adjectives at this stage; focus only on the things that need to exist.
A good sanity check: if you can count individual instances of it ("one review", "three products"), it is probably an entity.
sql
```
-- Feature: "Users can leave reviews on products"
-- Entities found:
--   users      → already exists
--   products   → new table
--   reviews    → new table (joins the two)

-- Avoid: storing reviews as a JSON column on products.
-- Each row in a table should represent exactly one thing.
```
2
Choose Primary Keys Deliberately
Three options, each with trade-offs:
- Auto-increment integer — compact, fast joins, predictable. Exposes record count and allows enumeration attacks. Fine for internal data.
- UUID (v4) — globally unique, safe to expose. Larger (16 bytes), random — causes index fragmentation on InnoDB. Use UUID_TO_BIN() to store as binary.
- ULID — like UUID but lexicographically sortable. Best of both worlds for most new tables.
Rule of thumb: if the ID will ever appear in a public URL, use ULID or UUID. If it is purely internal (junction tables, log entries), auto-increment is fine.
sql
```
-- Auto-increment (internal tables)
CREATE TABLE order_items (
    id         BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    order_id   BIGINT UNSIGNED NOT NULL,
    product_id BIGINT UNSIGNED NOT NULL
);

-- ULID as CHAR(26) (public-facing tables)
CREATE TABLE products (
    id         CHAR(26)     NOT NULL PRIMARY KEY,  -- ULID
    name       VARCHAR(255) NOT NULL
);
```
3
Normalize to Third Normal Form
Normalization removes redundancy. You need to reach 3NF before shipping any schema. Three rules to apply in order:
- 1NF: Every column holds a single atomic value — no comma-separated lists, no JSON arrays of the same concept.
- 2NF: Every non-key column depends on the whole primary key (only relevant for composite keys).
- 3NF: Every non-key column depends only on the primary key — not on another non-key column.
Classic 3NF violation: storing category_name next to category_id in the products table. If the category name changes you now have inconsistent data in every row.
sql
```
-- Bad: category_name duplicated across rows
CREATE TABLE products (
    id            BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    category_id   INT NOT NULL,
    category_name VARCHAR(100) NOT NULL  -- 3NF violation
);

-- Good: category lives in its own table
CREATE TABLE categories (
    id   INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE products (
    id          BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    category_id INT UNSIGNED NOT NULL,
    FOREIGN KEY (category_id) REFERENCES categories(id)
);
```
4
Set Foreign Keys and ON DELETE Rules
Every foreign key needs an explicit ON DELETE rule. Picking the wrong one causes orphaned rows or accidental data loss:
- CASCADE — delete the parent, delete the children too. Use when children cannot exist without the parent (e.g., order items belong to an order).
- SET NULL — delete the parent, set the FK column to NULL. Use when the child can exist independently (e.g., a post whose author account was deleted).
- RESTRICT (default) — block parent deletion if children exist. Use when you want to force the application to clean up first (safest default).
sql
```
CREATE TABLE reviews (
    id         BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    user_id    BIGINT UNSIGNED NOT NULL,
    product_id BIGINT UNSIGNED NOT NULL,
    rating     TINYINT UNSIGNED NOT NULL CHECK (rating BETWEEN 1 AND 5),
    body       TEXT,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- User deleted → delete their reviews
    FOREIGN KEY (user_id)    REFERENCES users(id)    ON DELETE CASCADE,

    -- Product deleted → delete its reviews
    FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE CASCADE
);
```
5
Pick Column Types Carefully
Column types have real performance and correctness implications:
- VARCHAR(n) — always set a realistic n. VARCHAR(255) is a lazy default; a slug is rarely more than 100 chars. InnoDB uses the declared length for sort buffers.
- DECIMAL(p, s) vs FLOAT — never use FLOAT for money. FLOAT cannot represent 0.1 exactly. Use DECIMAL(10, 2) for currency.
- TIMESTAMP vs DATETIME — TIMESTAMP stores in UTC and converts to the session timezone automatically (good for created_at/updated_at). DATETIME stores as-is, timezone-naïve — use it when the value represents a calendar concept, not an instant.
- TINYINT(1) for booleans — MySQL has no native BOOL type; TINYINT(1) is the convention.
sql
```
CREATE TABLE orders (
    id           BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    user_id      BIGINT UNSIGNED NOT NULL,
    total_amount DECIMAL(10, 2)  NOT NULL,          -- not FLOAT
    currency     CHAR(3)         NOT NULL DEFAULT 'USD',
    status       ENUM('pending','paid','shipped','cancelled') NOT NULL DEFAULT 'pending',
    is_gift      TINYINT(1)      NOT NULL DEFAULT 0,
    scheduled_at DATETIME        NULL,              -- calendar date, no timezone conversion
    created_at   TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at   TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
```

Establish Naming Conventions

Inconsistent naming is the silent killer of schema readability. Pick a convention and enforce it across every table:

Table names: plural, snake_case — order_items, product_reviews
Column names: snake_case — first_name, created_at
Foreign keys: singular table name + _id suffix — user_id, product_id
Boolean columns: is_ or has_ prefix — is_active, has_verified_email
Timestamps: always created_at and updated_at — every table gets them

sql

-- Good naming
CREATE TABLE blog_posts (
    id             BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    author_id      BIGINT UNSIGNED NOT NULL,   -- FK: singular + _id
    category_id    INT UNSIGNED    NOT NULL,
    title          VARCHAR(255)    NOT NULL,
    slug           VARCHAR(100)    NOT NULL UNIQUE,
    is_published   TINYINT(1)      NOT NULL DEFAULT 0,
    published_at   TIMESTAMP       NULL,
    created_at     TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at     TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

Denormalize Only When You Have Proof

Denormalization is a deliberate trade: you introduce redundancy to avoid expensive JOINs. Do it only after you have a query that is actually slow with real data. Never pre-optimize.

Common valid cases: storing a cached comment_count on the posts table (updated by a trigger or application event), or storing a serialized snapshot of an address on an orders row so it does not change if the user later updates their address.

sql

-- Caching a count to avoid COUNT(*) on large tables
ALTER TABLE posts ADD COLUMN comment_count INT UNSIGNED NOT NULL DEFAULT 0;

-- Update it when a comment is inserted (application layer)
-- UPDATE posts SET comment_count = comment_count + 1 WHERE id = ?;

-- Snapshotting an address so order history stays accurate
ALTER TABLE orders ADD COLUMN shipping_address JSON NOT NULL;
-- Stored at order creation time; never changes even if user edits their profile.

Draw the Schema Before Writing Migrations

Before writing a single migration, put the schema on paper or in dbdiagram.io. Drawing forces you to see relationships, spot missing tables, and catch naming conflicts before they are committed to version control.

dbdiagram.io uses a simple DSL and exports to SQL. Share the diagram URL with teammates for review — it is far easier to refactor a diagram than to write corrective migrations after the fact.

javascript

// dbdiagram.io DSL example
Table users {
  id         bigint [pk, increment]
  email      varchar(255) [unique, not null]
  created_at timestamp
}

Table products {
  id          bigint [pk, increment]
  category_id int    [ref: > categories.id]
  name        varchar(255)
  price       decimal(10,2)
}

Table reviews {
  id         bigint [pk, increment]
  user_id    bigint [ref: > users.id]
  product_id bigint [ref: > products.id]
  rating     tinyint
  body       text
}

Tips & gotchas

Every table should have <code>created_at</code> and <code>updated_at</code> — no exceptions. You will always need to know when something happened.
Soft-delete patterns (<code>deleted_at</code> timestamp) are useful but add query complexity everywhere. Only use them if you genuinely need to recover deleted data.
Never store computed values in the database unless you have a proven performance reason. Compute them in the application layer.
If a column will only ever hold a fixed set of values, use <code>ENUM</code> or a lookup table — not a free-text VARCHAR. Constraints at the DB level are your last line of defense.
Index your foreign keys. MySQL/InnoDB does not create them automatically, and an un-indexed FK makes the parent's deletes and updates scan the entire child table.

Wrapping up

A clean schema is not about following rules for their own sake — it is about making the database do what it is good at: enforcing consistency, enabling fast queries, and surviving years of feature additions without a rewrite. Design on paper first, normalize until it hurts, then denormalize only where benchmarks justify it.

#Database #SQL #Design

Back to all guides

How to Design a Clean Database Schema for a New Feature

Step-by-step

Extract Entities from the Feature Description

Choose Primary Keys Deliberately

Normalize to Third Normal Form

Set Foreign Keys and ON DELETE Rules

Pick Column Types Carefully

Establish Naming Conventions

Denormalize Only When You Have Proof

Draw the Schema Before Writing Migrations

Tips & gotchas

Wrapping up

Need the fast version?

Related guides

How to Deploy a Laravel App to a VPS (Apache + MySQL)

How to Add Multilingual Support to a Laravel App

How to Send Transactional Emails in Laravel

Need Help With Your Project?