Why Model Data?
Why Model Data?
Applications come and go. The clinic booking system built in 2010 was rewritten in 2018 and will be replaced again. But the underlying data — patient records, appointment histories, diagnoses — must survive every rewrite, every platform change, every vendor switch. Data is the long-lived asset. The application is merely a temporary window onto that data.
This insight is the first and most important reason to model data before you build anything: a well-designed data model outlasts every application that uses it. A poorly designed one cripples every application built on top of it, often for decades.
What Is a Data Model?
A data model is a precise description of the data an organization needs to store, the rules that govern it, and the relationships between different pieces of information. It answers three questions:
- What things does the business need to remember? (entities and attributes)
- How are those things related to each other? (relationships)
- What rules constrain the data? (constraints and integrity rules)
A data model is not a database schema. It is not a class diagram. It is not a spreadsheet. It is an abstraction — a clean representation of reality that strips away implementation details and reveals the essential structure of information. The goal is clarity of understanding, not a set of SQL statements.
Three Levels of Data Models
Data modeling is not a single act — it is a progressive refinement across three levels of abstraction. Each level serves a different audience and answers different questions. Understanding which level you are working at prevents enormous confusion during analysis and design.
The three levels are: Conceptual, Logical, and Physical.
The Conceptual Model: What Things Exist?
The conceptual model is a high-level map of the business domain. It is created in collaboration with business stakeholders — clinic managers, librarians, store owners — who understand the problem but may not know what a foreign key is. The conceptual model uses plain language and simple diagrams to capture agreement on what the system must remember.
For a library system, a conceptual model might identify: Member, Book, Loan, Author — and show that Members borrow Books, Books are written by Authors, and each borrowing is recorded as a Loan. No attributes, no data types, no keys. Just the big picture.
The Logical Model: What Data Do We Need?
The logical model adds the full set of attributes, identifies primary and foreign keys, and defines the cardinality of every relationship. It is still independent of any specific database technology — it does not matter whether you will use MySQL, PostgreSQL, or Oracle. The logical model answers the question: what data must the system store and what rules must it enforce?
For the library system, the logical model would show that a Member entity has attributes member_id (PK), full_name, email, date_joined. A Loan entity has loan_id (PK), member_id (FK), book_id (FK), loan_date, due_date, return_date. The relationship says one Member can have many Loans, but each Loan belongs to exactly one Member.
The logical model is the core deliverable of data modeling. It is what analysts hand to database designers and what developers reference when writing queries.
The Physical Model: How Will It Be Stored?
The physical model translates the logical model into the syntax of a specific database engine. A logical attribute email : String becomes a MySQL column email VARCHAR(255) NOT NULL UNIQUE. Indexes are added to columns that will be searched frequently. Tables are partitioned for large datasets. Storage parameters are tuned.
The physical model is the province of database administrators and senior developers. Analysts rarely build physical models, but they need to understand what the physical model is so they can have informed conversations about performance, scalability, and platform constraints.
Why Data Outlives Applications
Consider what happens when an online store migrates from a custom PHP application to a modern microservices architecture. Every route, every view, every controller is rewritten. The product catalog, order history, customer records, and shipping data, however, must be migrated intact. If those were stored thoughtlessly — mixed business logic baked into the schema, missing constraints, duplicate columns — the migration becomes a months-long crisis of data cleansing.
Contrast this with a well-modeled data store: entities are clear, relationships are explicit, integrity constraints prevent corrupt data from ever being saved. The same data is migrated cleanly in a few days because anyone can read the model and understand exactly what every table stores and why.
This is not hypothetical. Organizations routinely operate on the same core data for 20–30 years across four or five different application generations. A hospital that opened in 1995 still needs the patient record created on day one. The applications have changed beyond recognition. The data has not.
What Data Modeling Is Not
Clarifying common misconceptions saves time in team discussions:
- Not a class diagram — class diagrams model behavior as well as data. Data models focus exclusively on persistent information, not on methods or operations.
- Not a database schema — a schema is a physical implementation. A data model is an analysis artifact. Many projects maintain a data model that predates and outlives any particular schema version.
- Not optional on small projects — small projects have a habit of growing. A five-table database with no model becomes a twenty-table database with no model in six months, at which point nobody understands why half the columns exist.
The Analyst's Role in Data Modeling
Systems analysts are not database administrators, but they are the bridge between the business and the database design. Your job in data modeling is to:
- Discover what data the business needs to remember by interviewing stakeholders and examining existing documents, spreadsheets, and forms.
- Model that data at the conceptual and logical levels — drawing the entities, attributes, and relationships correctly before any implementation decisions are made.
- Communicate the model to developers, data architects, and business stakeholders, using the appropriate level of abstraction for each audience.
- Validate the model against business rules — every business rule that refers to stored data should be traceable to a constraint or relationship in the data model.
The coming lessons in this tutorial will teach you every element of data modeling in depth: identifying entities, specifying attributes, drawing cardinality, resolving many-to-many relationships, applying normalization rules, and writing a data dictionary. By the end, you will be able to take any real-world business scenario and produce a complete, professional data model from scratch.
Summary
- Data outlives applications — the core reason to model data carefully and rigorously before any code is written.
- A data model describes what data exists, how it is related, and what rules govern it — independent of implementation.
- The conceptual model is for business stakeholders: entities and major relationships only, no implementation details.
- The logical model is for analysts and architects: full attributes, keys, and cardinality, but database-independent.
- The physical model is for DBAs and developers: platform-specific tables, columns, types, and indexes.
- The analyst owns the conceptual and logical levels; the DBA owns the physical level.