Introduction to Data Flow Diagrams
Introduction to Data Flow Diagrams
Flowcharts answer the question how does a process execute step by step? But analysts also need a complementary view: what data moves through the system, where does it come from, where does it go, and what transforms it? That is the question Data Flow Diagrams (DFDs) are designed to answer.
A DFD is a graphical model that shows the movement of data through an information system. It ignores control flow, timing, and internal logic — it focuses purely on data in motion. A well-drawn DFD tells a business stakeholder exactly what information the system consumes, produces, and stores, without burying them in technical implementation details.
Why Use DFDs?
Consider a clinic booking system. A flowchart shows the decision logic: is the doctor available? Yes → confirm; No → offer alternatives. A DFD, by contrast, shows: the patient sends a Booking Request into the system; the Schedule Appointment process reads from the Doctor Availability store and writes to the Appointment Register; the patient and doctor both receive a Confirmation Notice.
Both views are valuable. Use flowcharts to model logic and decisions; use DFDs to model data and its transformations. Experienced analysts switch between the two as naturally as switching between a map and a street-level view.
The Four DFD Elements
Every DFD, regardless of notation family or level of detail, is built from exactly four types of element. Mastering these four gives you the vocabulary to read and draw any DFD.
1. External Entity (Source / Sink)
An external entity is a person, organization, or other system that exists outside the system boundary. It is the origin of data flowing in (a source) or the destination of data flowing out (a sink). The system has no control over what happens inside an external entity — it only knows what data is exchanged.
- In Gane-Sarson notation: drawn as a plain rectangle.
- In Yourdon-DeMarco notation: drawn as a plain rectangle (same shape, same meaning).
- Labeled with a noun: Customer, Supplier, Payment Gateway, Library Member.
The same external entity may appear more than once on a large diagram to avoid crossing lines — a duplicate is marked with a small diagonal line or shadow in the corner.
2. Process
A process transforms, routes, or generates data. It receives data flows as inputs and produces data flows as outputs. Processes represent work performed by the system — whether by software, a person, or a combination of both.
- In Gane-Sarson notation: drawn as a rounded rectangle (pill-shaped) divided into two compartments: a number in the top-left (for unique identification) and a verb-phrase label in the body (Validate Login, Process Payment, Register Member).
- In Yourdon-DeMarco notation: drawn as a circle with the label inside.
- Label rule: always use a verb + noun phrase — Check Availability, not just Availability.
3. Data Store
A data store is a repository where data rests between processes — a file, a database table, a cabinet of paper records, or any persistent storage. Data flows into a store when a process writes data; data flows out of a store when a process reads it.
- In Gane-Sarson notation: drawn as an open-ended rectangle — two horizontal parallel lines closed on the left by a short vertical line, open on the right — with a number prefix (D1, D2 …) and a noun label (Appointment Register, Product Catalog, Member Records).
- In Yourdon-DeMarco notation: drawn as two parallel horizontal lines with no end caps, and the label between them.
- A data store does not transform data — it only holds it.
4. Data Flow
A data flow is a named, directed arrow that represents data in motion between two DFD elements. It is the connective tissue of the diagram.
- Drawn as a labeled arrow (solid or slightly curved line with an arrowhead).
- The label must name the specific data being carried: Booking Request, Invoice, Search Query. Generic labels like data or information are forbidden — they add no analytical value.
- Arrowheads show direction. A bidirectional arrow means data moves in both directions under the same name — use this sparingly and only when the pairing is truly inseparable.
Notation Legend (Gane-Sarson)
The diagram below shows all four elements in Gane-Sarson notation — the standard most widely used in business analysis and the notation this course follows. Study it before reading any further DFD.
Reading a DFD: Online Store Order Example
Before drawing your first DFD, practice reading one. The diagram below shows a fragment of an online store order-processing system. Walk through it systematically:
- Identify the external entities (who is outside the system).
- Identify each process (what work does the system do).
- Identify data stores (what does the system remember).
- Trace each data flow (what data travels, in which direction).
Notation Variants: Gane-Sarson vs. Yourdon-DeMarco
Two notation families dominate professional practice:
- Gane-Sarson — rounded-rectangle processes, open-ended rectangle stores. Preferred in business analysis, government, and enterprise IT projects. The visual distinction between elements is strong, making diagrams easy to read for non-technical stakeholders.
- Yourdon-DeMarco — circle processes, double parallel-line stores. Common in academic computer science and software engineering textbooks. Circles are called "bubbles," which is why Yourdon DFDs are sometimes called "bubble diagrams."
Both notations express identical semantics. Choose one and apply it consistently within a project. This course uses Gane-Sarson throughout.
Rules Every DFD Must Satisfy
Beyond notation, a valid DFD must obey a set of structural rules:
- Every process must have at least one input flow and one output flow — a process with no input is a miracle; one with no output is a black hole.
- Data flows must connect compatible element types. Flows between two external entities are forbidden. Flows directly between two data stores are forbidden. All data movement must pass through at least one process.
- Data stores must be read or written by at least one process — a store nobody accesses serves no purpose.
- Every element must be labeled. Unnamed processes, flows, or stores indicate incomplete analysis.
What DFDs Do Not Show
Understanding the limits of DFDs is as important as understanding their strengths. A DFD does not show:
- The sequence or timing of processes — that is a flowchart or sequence diagram concern.
- Decision logic — conditions and branches are invisible on a DFD.
- The internal structure of data — for that you need an entity-relationship diagram (covered later in this course).
- The technology implementing the processes — a DFD is technology-neutral by design.
Summary
- A Data Flow Diagram models what data moves through a system, not how processes execute internally.
- The four elements are: External Entity (plain rectangle), Process (numbered rounded rectangle), Data Store (open-ended rectangle), Data Flow (labeled arrow) — in Gane-Sarson notation.
- Label every element with meaningful business terms; number every process.
- No two external entities can exchange data directly; all flow must pass through a process.
- DFDs complement flowcharts — together they give a complete picture of what a system does and how it does it.