Step-by-step
-
1
Extract Entities from the Feature Description
Read the feature description and underline the nouns — those are your entities. Each entity becomes a table. For a "users can leave reviews on products" feature the nouns are users, reviews, and products. Ignore verbs and adjectives at this stage; focus only on the things that need to exist.
A good sanity check: if you can count individual instances of it ("one review", "three products"), it is probably an entity.
sql-- Feature: "Users can leave reviews on products" -- Entities found: -- users → already exists -- products → new table -- reviews → new table (joins the two) -- Avoid: storing reviews as a JSON column on products. -- Each row in a table should represent exactly one thing. -
2
Choose Primary Keys Deliberately
Three options, each with trade-offs:
- Auto-increment integer — compact, fast joins, predictable. Exposes record count and allows enumeration attacks. Fine for internal data.
- UUID (v4) — globally unique, safe to expose. Larger (16 bytes), random — causes index fragmentation on InnoDB. Use
UUID_TO_BIN()to store as binary. - ULID — like UUID but lexicographically sortable. Best of both worlds for most new tables.
Rule of thumb: if the ID will ever appear in a public URL, use ULID or UUID. If it is purely internal (junction tables, log entries), auto-increment is fine.
sql-- Auto-increment (internal tables) CREATE TABLE order_items ( id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, order_id BIGINT UNSIGNED NOT NULL, product_id BIGINT UNSIGNED NOT NULL ); -- ULID as CHAR(26) (public-facing tables) CREATE TABLE products ( id CHAR(26) NOT NULL PRIMARY KEY, -- ULID name VARCHAR(255) NOT NULL ); -
3
Normalize to Third Normal Form
Normalization removes redundancy. You need to reach 3NF before shipping any schema. Three rules to apply in order:
- 1NF: Every column holds a single atomic value — no comma-separated lists, no JSON arrays of the same concept.
- 2NF: Every non-key column depends on the whole primary key (only relevant for composite keys).
- 3NF: Every non-key column depends only on the primary key — not on another non-key column.
Classic 3NF violation: storing
category_namenext tocategory_idin theproductstable. If the category name changes you now have inconsistent data in every row.sql-- Bad: category_name duplicated across rows CREATE TABLE products ( id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, category_id INT NOT NULL, category_name VARCHAR(100) NOT NULL -- 3NF violation ); -- Good: category lives in its own table CREATE TABLE categories ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100) NOT NULL ); CREATE TABLE products ( id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, category_id INT UNSIGNED NOT NULL, FOREIGN KEY (category_id) REFERENCES categories(id) ); -
4
Set Foreign Keys and ON DELETE Rules
Every foreign key needs an explicit
ON DELETErule. Picking the wrong one causes orphaned rows or accidental data loss:- CASCADE — delete the parent, delete the children too. Use when children cannot exist without the parent (e.g., order items belong to an order).
- SET NULL — delete the parent, set the FK column to NULL. Use when the child can exist independently (e.g., a post whose author account was deleted).
- RESTRICT (default) — block parent deletion if children exist. Use when you want to force the application to clean up first (safest default).
sqlCREATE TABLE reviews ( id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, user_id BIGINT UNSIGNED NOT NULL, product_id BIGINT UNSIGNED NOT NULL, rating TINYINT UNSIGNED NOT NULL CHECK (rating BETWEEN 1 AND 5), body TEXT, created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, -- User deleted → delete their reviews FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE, -- Product deleted → delete its reviews FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE CASCADE ); -
5
Pick Column Types Carefully
Column types have real performance and correctness implications:
- VARCHAR(n) — always set a realistic n.
VARCHAR(255)is a lazy default; a slug is rarely more than 100 chars. InnoDB uses the declared length for sort buffers. - DECIMAL(p, s) vs FLOAT — never use FLOAT for money.
FLOATcannot represent 0.1 exactly. UseDECIMAL(10, 2)for currency. - TIMESTAMP vs DATETIME —
TIMESTAMPstores in UTC and converts to the session timezone automatically (good forcreated_at/updated_at).DATETIMEstores as-is, timezone-naïve — use it when the value represents a calendar concept, not an instant. - TINYINT(1) for booleans — MySQL has no native BOOL type;
TINYINT(1)is the convention.
sqlCREATE TABLE orders ( id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, user_id BIGINT UNSIGNED NOT NULL, total_amount DECIMAL(10, 2) NOT NULL, -- not FLOAT currency CHAR(3) NOT NULL DEFAULT 'USD', status ENUM('pending','paid','shipped','cancelled') NOT NULL DEFAULT 'pending', is_gift TINYINT(1) NOT NULL DEFAULT 0, scheduled_at DATETIME NULL, -- calendar date, no timezone conversion created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ); - VARCHAR(n) — always set a realistic n.
-
6
Establish Naming Conventions
Inconsistent naming is the silent killer of schema readability. Pick a convention and enforce it across every table:
- Table names: plural, snake_case —
order_items,product_reviews - Column names: snake_case —
first_name,created_at - Foreign keys: singular table name +
_idsuffix —user_id,product_id - Boolean columns:
is_orhas_prefix —is_active,has_verified_email - Timestamps: always
created_atandupdated_at— every table gets them
sql-- Good naming CREATE TABLE blog_posts ( id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, author_id BIGINT UNSIGNED NOT NULL, -- FK: singular + _id category_id INT UNSIGNED NOT NULL, title VARCHAR(255) NOT NULL, slug VARCHAR(100) NOT NULL UNIQUE, is_published TINYINT(1) NOT NULL DEFAULT 0, published_at TIMESTAMP NULL, created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ); - Table names: plural, snake_case —
-
7
Denormalize Only When You Have Proof
Denormalization is a deliberate trade: you introduce redundancy to avoid expensive JOINs. Do it only after you have a query that is actually slow with real data. Never pre-optimize.
Common valid cases: storing a cached
comment_counton thepoststable (updated by a trigger or application event), or storing a serialized snapshot of an address on anordersrow so it does not change if the user later updates their address.sql-- Caching a count to avoid COUNT(*) on large tables ALTER TABLE posts ADD COLUMN comment_count INT UNSIGNED NOT NULL DEFAULT 0; -- Update it when a comment is inserted (application layer) -- UPDATE posts SET comment_count = comment_count + 1 WHERE id = ?; -- Snapshotting an address so order history stays accurate ALTER TABLE orders ADD COLUMN shipping_address JSON NOT NULL; -- Stored at order creation time; never changes even if user edits their profile. -
8
Draw the Schema Before Writing Migrations
Before writing a single migration, put the schema on paper or in dbdiagram.io. Drawing forces you to see relationships, spot missing tables, and catch naming conflicts before they are committed to version control.
dbdiagram.io uses a simple DSL and exports to SQL. Share the diagram URL with teammates for review — it is far easier to refactor a diagram than to write corrective migrations after the fact.
javascript// dbdiagram.io DSL example Table users { id bigint [pk, increment] email varchar(255) [unique, not null] created_at timestamp } Table products { id bigint [pk, increment] category_id int [ref: > categories.id] name varchar(255) price decimal(10,2) } Table reviews { id bigint [pk, increment] user_id bigint [ref: > users.id] product_id bigint [ref: > products.id] rating tinyint body text }
Tips & gotchas
- Every table should have <code>created_at</code> and <code>updated_at</code> — no exceptions. You will always need to know when something happened.
- Soft-delete patterns (<code>deleted_at</code> timestamp) are useful but add query complexity everywhere. Only use them if you genuinely need to recover deleted data.
- Never store computed values in the database unless you have a proven performance reason. Compute them in the application layer.
- If a column will only ever hold a fixed set of values, use <code>ENUM</code> or a lookup table — not a free-text VARCHAR. Constraints at the DB level are your last line of defense.
- Index your foreign keys. MySQL/InnoDB does not create them automatically, and an un-indexed FK makes the parent's deletes and updates scan the entire child table.
Wrapping up
A clean schema is not about following rules for their own sake — it is about making the database do what it is good at: enforcing consistency, enabling fast queries, and surviving years of feature additions without a rewrite. Design on paper first, normalize until it hurts, then denormalize only where benchmarks justify it.