Data Modeling & ERDs

Keys: Primary, Foreign & Candidate

18 min Lesson 5 of 10

Keys: Primary, Foreign & Candidate

A database table full of rows is useless unless you can reliably identify each row, trace it back to related rows in other tables, and enforce data integrity automatically. That is exactly what keys do. Understanding the three main types — primary, foreign, and candidate — is the foundation for turning a conceptual entity–relationship diagram into a real, working database schema.

Why Keys Matter in Analysis

During requirements gathering you identify entities (Customer, Order, Product). During data modeling you must decide how each entity is uniquely identified and how the relationships between entities are physically implemented. Keys answer both questions. A missing or poorly chosen key leads to duplicate records, broken references, and queries that return the wrong data — problems that surface late and are expensive to fix.

Candidate Keys

A candidate key is any attribute (or minimal combination of attributes) that could serve as a unique identifier for every row in an entity. "Candidate" means it is eligible — not yet chosen. Rules for a candidate key:

Uniqueness: no two rows may share the same value.
Irreducibility (minimality): removing any attribute from a composite candidate key would break uniqueness.
Non-null: a candidate key must never be null — a missing identifier is meaningless.

Example — the Patient entity in a clinic booking system might have these candidate keys:

patient_id — a system-generated number, always unique.
national_id — the government-issued identity number, also unique per person.
email — if the clinic enforces one account per email address.

All three qualify as candidate keys. You must pick one to promote to the primary key; the rest become alternate keys (sometimes enforced with a UNIQUE constraint in the database).

Surrogate vs. Natural Candidate Keys: patient_id is a surrogate key — a meaningless number created purely for identification. national_id is a natural key — it exists in the real world. Both can be candidate keys. Surrogates are usually preferred as primary keys because they never change and have no business meaning that could become stale.

The Primary Key

The primary key (PK) is the single candidate key the analyst (and later the database designer) formally designates as the official identifier of the entity. Every row in the table must have a value for the PK, it must be unique, and it must never change over the lifetime of the record.

In an ERD using crow-foot notation, the primary key attribute is written at the top of the entity box and is marked with PK (or underlined, depending on the notation variant).

Practical rules for choosing a primary key:

Prefer stable values — phone numbers and email addresses change; a surrogate integer does not.
Prefer small values — a 4-byte integer is faster to index and join than a 200-character string.
Prefer meaningless values — encoding business logic (e.g. "P2024-001") into a PK creates fragility when the business logic changes.

The Foreign Key

A foreign key (FK) is an attribute in one entity (the child) whose values must match the primary key values in another entity (the parent). It is the database-level implementation of a relationship line on your ERD.

Consider an online store: an Order belongs to a Customer. The Order table contains a customer_id column whose values must always exist in the Customer table. That customer_id in Order is the foreign key; the customer_id in Customer is the primary key being referenced.

Foreign keys enforce referential integrity: the database engine rejects any attempt to insert an order for a customer that does not exist, or to delete a customer who still has orders (unless a cascade rule allows it).

FK ≠ unique: Unlike a primary key, a foreign key column is almost never unique on its own. Many orders may share the same customer_id. That is the whole point — it captures the one-to-many (or many-to-many) relationship.

Reading Keys on a Crow-Foot ERD

The diagram below shows a simplified clinic booking schema with three entities: Patient, Appointment, and Doctor. Study the PK and FK markers and the crow-foot line ends that show cardinality.

Crow-foot ERD showing PK and FK markers for the clinic booking schema: a Patient has many Appointments; a Doctor has many Appointments.

Notice that Appointment carries two foreign keys — patient_id and doctor_id. This is how the database physically implements both the Patient–Appointment and the Doctor–Appointment relationships shown on the conceptual ERD. Without these FKs the rows in Appointment would have no way to refer to the correct patient or doctor.

Composite Primary Keys

Sometimes no single attribute is enough to identify a row uniquely, and the PK must be formed from a combination of attributes. This is common in associative entities (the tables that resolve many-to-many relationships). For example, a library system might have a BookLoan entity where the combination of member_id + book_id + loan_date forms the PK — because the same member may borrow the same book on different dates, but no two loans share all three values at once.

When to use a composite PK vs. a surrogate PK: Many teams prefer adding a surrogate loan_id and making all three columns a UNIQUE constraint instead. Both approaches are valid. Surrogate PKs keep FK columns small; composite PKs avoid an extra column. Discuss the trade-off with the DBA and document the decision in the data dictionary (covered in Lesson 9).

How Keys Are Shown in an ERD — A Second Example

The diagram below extends the online store scenario. It shows an OrderLine associative entity with a composite PK, illustrating how both its FK columns together identify each row.

Online store ERD: OrderLine carries a composite PK formed by two FK columns (order_id + product_id), implementing the many-to-many relationship between Order and Product.

Summary: Choosing and Documenting Keys

As a systems analyst your key-related deliverables are:

List candidate keys for every entity during the data modeling session with stakeholders.
Designate the primary key — prefer a stable, small, meaningless surrogate unless a natural key clearly wins.
Mark FKs on the ERD and confirm they reference the correct parent PK, with documented cascade rules (restrict, set null, cascade delete).
Record alternate keys in the data dictionary with their uniqueness constraints.

Keys connect the diagram to the database: Every relationship line on your ERD becomes a FK column in the child table. Drawing and labeling them correctly ensures that the development team can translate your model directly into CREATE TABLE statements with no ambiguity.