Process Modeling: Flowcharts & DFDs

Introduction to Data Flow Diagrams

18 min Lesson 4 of 10

Introduction to Data Flow Diagrams

Flowcharts answer the question how does a process execute step by step? But analysts also need a complementary view: what data moves through the system, where does it come from, where does it go, and what transforms it? That is the question Data Flow Diagrams (DFDs) are designed to answer.

A DFD is a graphical model that shows the movement of data through an information system. It ignores control flow, timing, and internal logic — it focuses purely on data in motion. A well-drawn DFD tells a business stakeholder exactly what information the system consumes, produces, and stores, without burying them in technical implementation details.

Why Use DFDs?

Consider a clinic booking system. A flowchart shows the decision logic: is the doctor available? Yes → confirm; No → offer alternatives. A DFD, by contrast, shows: the patient sends a Booking Request into the system; the Schedule Appointment process reads from the Doctor Availability store and writes to the Appointment Register; the patient and doctor both receive a Confirmation Notice.

Both views are valuable. Use flowcharts to model logic and decisions; use DFDs to model data and its transformations. Experienced analysts switch between the two as naturally as switching between a map and a street-level view.

The Four DFD Elements

Every DFD, regardless of notation family or level of detail, is built from exactly four types of element. Mastering these four gives you the vocabulary to read and draw any DFD.

1. External Entity (Source / Sink)

An external entity is a person, organization, or other system that exists outside the system boundary. It is the origin of data flowing in (a source) or the destination of data flowing out (a sink). The system has no control over what happens inside an external entity — it only knows what data is exchanged.

  • In Gane-Sarson notation: drawn as a plain rectangle.
  • In Yourdon-DeMarco notation: drawn as a plain rectangle (same shape, same meaning).
  • Labeled with a noun: Customer, Supplier, Payment Gateway, Library Member.

The same external entity may appear more than once on a large diagram to avoid crossing lines — a duplicate is marked with a small diagonal line or shadow in the corner.

2. Process

A process transforms, routes, or generates data. It receives data flows as inputs and produces data flows as outputs. Processes represent work performed by the system — whether by software, a person, or a combination of both.

  • In Gane-Sarson notation: drawn as a rounded rectangle (pill-shaped) divided into two compartments: a number in the top-left (for unique identification) and a verb-phrase label in the body (Validate Login, Process Payment, Register Member).
  • In Yourdon-DeMarco notation: drawn as a circle with the label inside.
  • Label rule: always use a verb + noun phrase — Check Availability, not just Availability.
Number your processes. Each process gets a unique identifier (1.0, 1.1, 2.0 …) that links the DFD to written process descriptions (mini-specs or structured English). When a stakeholder questions process 2.3, everyone knows exactly which bubble they are discussing.

3. Data Store

A data store is a repository where data rests between processes — a file, a database table, a cabinet of paper records, or any persistent storage. Data flows into a store when a process writes data; data flows out of a store when a process reads it.

  • In Gane-Sarson notation: drawn as an open-ended rectangle — two horizontal parallel lines closed on the left by a short vertical line, open on the right — with a number prefix (D1, D2 …) and a noun label (Appointment Register, Product Catalog, Member Records).
  • In Yourdon-DeMarco notation: drawn as two parallel horizontal lines with no end caps, and the label between them.
  • A data store does not transform data — it only holds it.

4. Data Flow

A data flow is a named, directed arrow that represents data in motion between two DFD elements. It is the connective tissue of the diagram.

  • Drawn as a labeled arrow (solid or slightly curved line with an arrowhead).
  • The label must name the specific data being carried: Booking Request, Invoice, Search Query. Generic labels like data or information are forbidden — they add no analytical value.
  • Arrowheads show direction. A bidirectional arrow means data moves in both directions under the same name — use this sparingly and only when the pairing is truly inseparable.
Common mistake — processes touching processes directly. Two processes should never be connected by a data flow without an intervening data store or external entity (at higher detail levels). If processes communicate, the data must reside somewhere — make that explicit with a data store.

Notation Legend (Gane-Sarson)

The diagram below shows all four elements in Gane-Sarson notation — the standard most widely used in business analysis and the notation this course follows. Study it before reading any further DFD.

DFD Notation Legend — Gane-Sarson Gane-Sarson DFD Notation Legend External Entity (rectangle) Customer plain rect, no rounding 1.0 Process (rounded rect) 1 Validate Login number + verb phrase D1 Data Store (open-ended rect) D2 Appointment Register Booking Request Data Flow (labeled arrow) Quick Reference External Entity Plain rectangle — sources/sinks outside system boundary Process Rounded rectangle — numbered verb-phrase transform Data Store Open-ended rectangle — named persistent repository Data Flow Labeled arrow — named data moving between elements Yourdon variant: circles for processes; double parallel lines for stores.
The four Gane-Sarson DFD elements: External Entity (plain rectangle), Process (rounded rectangle with number), Data Store (open-ended rectangle), and Data Flow (labeled arrow).

Reading a DFD: Online Store Order Example

Before drawing your first DFD, practice reading one. The diagram below shows a fragment of an online store order-processing system. Walk through it systematically:

  1. Identify the external entities (who is outside the system).
  2. Identify each process (what work does the system do).
  3. Identify data stores (what does the system remember).
  4. Trace each data flow (what data travels, in which direction).
DFD Fragment — Online Store Order Processing Customer external entity Payment Gateway external entity 1 Process Order validate & record 2 Charge Customer call payment API D1 Orders D2 Inventory D3 Payment Records Order Request Order Confirmation Validated Order Charge Request Payment Result New Order Stock Check Availability Payment Record
A DFD fragment for online store order processing. Customer and Payment Gateway are external entities. Processes 1 and 2 transform data. Data stores D1–D3 persist records between operations. Named arrows carry specific data between elements.

Notation Variants: Gane-Sarson vs. Yourdon-DeMarco

Two notation families dominate professional practice:

  • Gane-Sarson — rounded-rectangle processes, open-ended rectangle stores. Preferred in business analysis, government, and enterprise IT projects. The visual distinction between elements is strong, making diagrams easy to read for non-technical stakeholders.
  • Yourdon-DeMarco — circle processes, double parallel-line stores. Common in academic computer science and software engineering textbooks. Circles are called "bubbles," which is why Yourdon DFDs are sometimes called "bubble diagrams."

Both notations express identical semantics. Choose one and apply it consistently within a project. This course uses Gane-Sarson throughout.

Analyst habit — name data flows precisely. Before finalizing any DFD, review every arrow label. Replace vague names like data, request, or information with specific business terms: Prescription Request, Cancellation Notice, Stock Deduction. Precise labels double as a data dictionary and reveal gaps: if you cannot name a flow, you do not fully understand it yet.

Rules Every DFD Must Satisfy

Beyond notation, a valid DFD must obey a set of structural rules:

  1. Every process must have at least one input flow and one output flow — a process with no input is a miracle; one with no output is a black hole.
  2. Data flows must connect compatible element types. Flows between two external entities are forbidden. Flows directly between two data stores are forbidden. All data movement must pass through at least one process.
  3. Data stores must be read or written by at least one process — a store nobody accesses serves no purpose.
  4. Every element must be labeled. Unnamed processes, flows, or stores indicate incomplete analysis.

What DFDs Do Not Show

Understanding the limits of DFDs is as important as understanding their strengths. A DFD does not show:

  • The sequence or timing of processes — that is a flowchart or sequence diagram concern.
  • Decision logic — conditions and branches are invisible on a DFD.
  • The internal structure of data — for that you need an entity-relationship diagram (covered later in this course).
  • The technology implementing the processes — a DFD is technology-neutral by design.

Summary

  • A Data Flow Diagram models what data moves through a system, not how processes execute internally.
  • The four elements are: External Entity (plain rectangle), Process (numbered rounded rectangle), Data Store (open-ended rectangle), Data Flow (labeled arrow) — in Gane-Sarson notation.
  • Label every element with meaningful business terms; number every process.
  • No two external entities can exchange data directly; all flow must pass through a process.
  • DFDs complement flowcharts — together they give a complete picture of what a system does and how it does it.