Entities & Attributes
Entities & Attributes
Before you can draw a single line of a data model, you need to answer a deceptively simple question: what things does this system need to remember? The answer comes in two layers. First you identify the things — the noun-like objects your system tracks. Then, for each thing, you identify the facts about it that the system must store. In entity-relationship modeling, those things are called entities and the facts are called attributes.
Getting these two steps right is the entire craft of data modeling. A well-chosen set of entities maps cleanly onto database tables that are easy to query and maintain. A poorly chosen set leads to duplication, confusion, and expensive refactoring later.
What Is an Entity?
An entity is a category of real-world object or concept about which the system needs to store data. Three hallmarks identify a genuine entity:
- It has identity. You can tell one instance apart from another. One patient is distinguishable from another patient. One product is distinguishable from another product.
- It has multiple instances. An entity type represents a whole class of things, not a single specific object. Patient is an entity type; Ahmed Al-Rashidi (Patient ID 4821) is an instance (also called a record or row).
- It has attributes worth storing. The system needs to record facts about it. If you cannot think of at least two meaningful attributes, the candidate probably is not a real entity — it might be just an attribute of something else.
Doctor describes the category. The instance Dr. Sara Hamid, ID 017, Cardiology is one row in the Doctors table. ERDs model the type, not individual rows.
Finding Entities in a Business Scenario
A reliable technique is to read the requirements and underline every significant noun. Consider this brief description of a clinic booking system:
Scanning for nouns gives candidates: Patient, Appointment, Doctor, Clinic Branch, Medical Specialty, Diagnosis, Medication. Now apply the three hallmarks to filter and confirm:
- Patient — has identity (patient ID), multiple instances, plenty of attributes (name, date of birth, phone). Confirmed entity.
- Appointment — has identity (appointment number), multiple instances, key attributes (date, time, status). Confirmed entity.
- Doctor — has identity (employee ID), multiple instances, attributes (name, specialty, license number). Confirmed entity.
- Clinic Branch — has identity (branch code), multiple instances, attributes (address, phone, opening hours). Confirmed entity.
- Medication — has identity (drug code), multiple instances, attributes (name, dosage form, strength). Confirmed entity.
Types of Attributes
Once you have an entity, you must carefully define what facts to store. Attributes are not all the same — they fall into distinct categories, and understanding those categories prevents design mistakes.
Simple (Atomic) Attributes
A simple attribute cannot be broken down further without losing meaning. It holds a single atomic value. Examples: date_of_birth, license_number, appointment_time, price. Simple attributes become a single column in a database table.
Composite Attributes
A composite attribute is logically made up of smaller components, each meaningful on its own. full_name composed of first_name, last_name is the classic example. address composed of street, city, postal_code, country is another. The design decision is whether to store the composite as a whole or to split it into its parts. Split it when you need to search or sort on individual components — you almost always want to search patients by city or sort orders by postal code, so address should always be decomposed.
Derived Attributes
A derived attribute can be calculated from other stored data, so it does not need its own column. age can be derived from date_of_birth and today's date. total_price can be derived by multiplying quantity by unit_price. In ER notation, derived attributes are shown with a dashed oval. In practice, you must decide whether to store them anyway (for query performance) or always compute them on the fly.
age alongside date_of_birth creates a maintenance burden — the stored age becomes wrong the moment the birthday passes unless you update it continuously.
ERD Notation: Drawing Entity Boxes
In standard ER notation (and in the crow-foot style used in modern tools), an entity is drawn as a rectangle. The entity name appears in the header in singular title case: Patient, not Patients. Attributes are listed inside the box, one per line. Primary key attributes are marked PK and typically underlined or placed at the top. Foreign keys are marked FK.
The diagram below shows three confirmed entities from the clinic scenario with their attributes typed and classified. Notice how composite address is decomposed into separate columns, and age is shown as a derived attribute (dashed).
Choosing the Right Level of Granularity
One of the most common early mistakes is storing a composite value as a single text column. Consider a Patient entity where address is stored as the single string "12 Elm Street, London, SW1A 1AA". This looks fine until you need to:
- Find all patients in a specific city.
- Sort patients by postal code for a mailing campaign.
- Display just the city name on a dashboard map.
All three operations become expensive string-parsing problems instead of simple column lookups. The fix — decomposing address into address_street, address_city, address_postal_code, address_country — should happen at design time, not after the database is live with millions of rows.
phone_numbers = "0501234567, 0559876543" into one column, that is a signal that you need a separate PatientPhone entity (or at least a separate phone_secondary column if there is a fixed maximum of two numbers). A column that holds a comma-separated list violates the fundamental rule that each cell should hold exactly one atomic value.
A Second Example: Online Store
To reinforce the technique, consider a simplified online store. Reading the requirements — "Customers place orders for products. Each order has one or more line items. Products belong to categories." — yields these confirmed entities:
Naming Conventions That Matter
Consistent naming prevents confusion across the whole project team:
- Entity names are singular nouns in PascalCase:
Customer,Appointment, notCustomersortbl_appointment. - Attribute names are in snake_case for database columns:
date_of_birth,unit_price. - Primary key convention: use
entity_id(e.g.,patient_id,order_id). Some teams use justid— either is fine as long as you are consistent. - Boolean attributes use
is_orhas_prefix:is_active,has_insurance. - Avoid abbreviations in entity or attribute names.
pt_dobis cryptic;patient_date_of_birthis unambiguous.
Summary
- An entity is a category of real-world thing the system must remember. It has identity, multiple instances, and meaningful attributes.
- Find entities by scanning requirements for significant nouns, then apply the three-hallmark test.
- Simple attributes are atomic single-value facts. Composite attributes have sub-components and should be decomposed into separate columns. Derived attributes are computed from stored data and are usually not stored separately.
- In ERD notation, entities are rectangles with a named header. Attributes are listed inside; PK and FK markers identify key columns.
- Get granularity right at design time: splitting a composite attribute later is expensive; storing a list in one column creates permanent query pain.