Data Modeling & ERDs

Why Model Data?

18 min Lesson 1 of 10

Why Model Data?

Applications come and go. The clinic booking system built in 2010 was rewritten in 2018 and will be replaced again. But the underlying data — patient records, appointment histories, diagnoses — must survive every rewrite, every platform change, every vendor switch. Data is the long-lived asset. The application is merely a temporary window onto that data.

This insight is the first and most important reason to model data before you build anything: a well-designed data model outlasts every application that uses it. A poorly designed one cripples every application built on top of it, often for decades.

What Is a Data Model?

A data model is a precise description of the data an organization needs to store, the rules that govern it, and the relationships between different pieces of information. It answers three questions:

  • What things does the business need to remember? (entities and attributes)
  • How are those things related to each other? (relationships)
  • What rules constrain the data? (constraints and integrity rules)

A data model is not a database schema. It is not a class diagram. It is not a spreadsheet. It is an abstraction — a clean representation of reality that strips away implementation details and reveals the essential structure of information. The goal is clarity of understanding, not a set of SQL statements.

Three Levels of Data Models

Data modeling is not a single act — it is a progressive refinement across three levels of abstraction. Each level serves a different audience and answers different questions. Understanding which level you are working at prevents enormous confusion during analysis and design.

The three levels are: Conceptual, Logical, and Physical.

The Three Levels of Data Models 1. Conceptual Model Audience: Business stakeholders, non-technical users Content: Key entities and major relationships only — no attributes, no keys, no types High Abstraction Add attributes & cardinality 2. Logical Model Audience: Analysts, data architects, senior developers Content: Entities, all attributes, primary keys, foreign keys, cardinality — database-independent Medium Abstraction Add platform-specific details 3. Physical Model Audience: Database administrators, developers Content: Tables, columns, data types, indexes, constraints, partitioning — specific to one DBMS Low Abstraction
The three levels of data modeling — each level adds more detail and moves closer to implementation.

The Conceptual Model: What Things Exist?

The conceptual model is a high-level map of the business domain. It is created in collaboration with business stakeholders — clinic managers, librarians, store owners — who understand the problem but may not know what a foreign key is. The conceptual model uses plain language and simple diagrams to capture agreement on what the system must remember.

For a library system, a conceptual model might identify: Member, Book, Loan, Author — and show that Members borrow Books, Books are written by Authors, and each borrowing is recorded as a Loan. No attributes, no data types, no keys. Just the big picture.

Rule of thumb: If you have to explain what a primary key is before your stakeholder can understand your diagram, you are showing them the wrong level of model. The conceptual model must be readable by anyone who understands the business.

The Logical Model: What Data Do We Need?

The logical model adds the full set of attributes, identifies primary and foreign keys, and defines the cardinality of every relationship. It is still independent of any specific database technology — it does not matter whether you will use MySQL, PostgreSQL, or Oracle. The logical model answers the question: what data must the system store and what rules must it enforce?

For the library system, the logical model would show that a Member entity has attributes member_id (PK), full_name, email, date_joined. A Loan entity has loan_id (PK), member_id (FK), book_id (FK), loan_date, due_date, return_date. The relationship says one Member can have many Loans, but each Loan belongs to exactly one Member.

The logical model is the core deliverable of data modeling. It is what analysts hand to database designers and what developers reference when writing queries.

The Physical Model: How Will It Be Stored?

The physical model translates the logical model into the syntax of a specific database engine. A logical attribute email : String becomes a MySQL column email VARCHAR(255) NOT NULL UNIQUE. Indexes are added to columns that will be searched frequently. Tables are partitioned for large datasets. Storage parameters are tuned.

The physical model is the province of database administrators and senior developers. Analysts rarely build physical models, but they need to understand what the physical model is so they can have informed conversations about performance, scalability, and platform constraints.

Three Levels Applied to a Library System Library System — Same Domain, Three Models Conceptual Member Loan Book borrows covers Logical Member PK member_id full_name email Loan PK loan_id FK member_id loan_date, due_date Book PK book_id, title, isbn Physical (MySQL) TABLE members member_id INT PK AI full_name VARCHAR(120) email VARCHAR(255) UQ INDEX (email) TABLE loans loan_id INT PK AI member_id INT FK loan_date DATE NN due_date DATE NN return_date DATE NULL TABLE books book_id INT PK AI title VARCHAR(300) NN isbn CHAR(13) UQ
The same library domain expressed at all three levels — from simple named ovals, through structured entity boxes with keys, to concrete SQL table definitions.

Why Data Outlives Applications

Consider what happens when an online store migrates from a custom PHP application to a modern microservices architecture. Every route, every view, every controller is rewritten. The product catalog, order history, customer records, and shipping data, however, must be migrated intact. If those were stored thoughtlessly — mixed business logic baked into the schema, missing constraints, duplicate columns — the migration becomes a months-long crisis of data cleansing.

Contrast this with a well-modeled data store: entities are clear, relationships are explicit, integrity constraints prevent corrupt data from ever being saved. The same data is migrated cleanly in a few days because anyone can read the model and understand exactly what every table stores and why.

This is not hypothetical. Organizations routinely operate on the same core data for 20–30 years across four or five different application generations. A hospital that opened in 1995 still needs the patient record created on day one. The applications have changed beyond recognition. The data has not.

Career implication: As a systems analyst, your data models will outlast the requirements document, the project, and possibly the team that built the system. Write them for the next analyst — or yourself in ten years — not just for the current sprint.

What Data Modeling Is Not

Clarifying common misconceptions saves time in team discussions:

  • Not a class diagram — class diagrams model behavior as well as data. Data models focus exclusively on persistent information, not on methods or operations.
  • Not a database schema — a schema is a physical implementation. A data model is an analysis artifact. Many projects maintain a data model that predates and outlives any particular schema version.
  • Not optional on small projects — small projects have a habit of growing. A five-table database with no model becomes a twenty-table database with no model in six months, at which point nobody understands why half the columns exist.
Common mistake: Jumping straight to creating tables in a database tool without first drawing a conceptual or logical model. The result is a schema that reflects how the first developer thought about data on day one — not how the business actually works. Every subsequent developer inherits those misunderstandings.

The Analyst's Role in Data Modeling

Systems analysts are not database administrators, but they are the bridge between the business and the database design. Your job in data modeling is to:

  1. Discover what data the business needs to remember by interviewing stakeholders and examining existing documents, spreadsheets, and forms.
  2. Model that data at the conceptual and logical levels — drawing the entities, attributes, and relationships correctly before any implementation decisions are made.
  3. Communicate the model to developers, data architects, and business stakeholders, using the appropriate level of abstraction for each audience.
  4. Validate the model against business rules — every business rule that refers to stored data should be traceable to a constraint or relationship in the data model.

The coming lessons in this tutorial will teach you every element of data modeling in depth: identifying entities, specifying attributes, drawing cardinality, resolving many-to-many relationships, applying normalization rules, and writing a data dictionary. By the end, you will be able to take any real-world business scenario and produce a complete, professional data model from scratch.

Summary

  • Data outlives applications — the core reason to model data carefully and rigorously before any code is written.
  • A data model describes what data exists, how it is related, and what rules govern it — independent of implementation.
  • The conceptual model is for business stakeholders: entities and major relationships only, no implementation details.
  • The logical model is for analysts and architects: full attributes, keys, and cardinality, but database-independent.
  • The physical model is for DBAs and developers: platform-specific tables, columns, types, and indexes.
  • The analyst owns the conceptual and logical levels; the DBA owns the physical level.