Platform Engineering & Developer Experience

Backstage & Service Catalogs

18 min Lesson 3 of 28

Backstage & Service Catalogs

Spotify open-sourced Backstage in 2020 after building it internally to tame 2,000+ microservices, 1,600+ engineers, and hundreds of infrastructure components that had no single pane of glass. The platform became the de-facto standard for Internal Developer Platforms (IDPs) and graduated as a CNCF Incubating project in 2022. At its core, Backstage is three loosely coupled pillars: the Software Catalog, Software Templates, and TechDocs. This lesson covers exactly those three pillars at the depth required to operate them in production.

The Software Catalog

The catalog is a living registry of every entity your organization owns: services, libraries, websites, pipelines, APIs, resources (S3 buckets, RDS clusters), systems, and domains. Each entity is described by a YAML file — called a catalog descriptor — that lives alongside the code it describes.

Every descriptor follows a common schema with apiVersion, kind, metadata, and spec. The metadata.annotations block is where Backstage plugins read their configuration — PagerDuty service IDs, Datadog dashboard links, GitHub Actions workflow paths, ArgoCD app names, and so on.

# catalog-info.yaml — place at the root of any repo apiVersion: backstage.io/v1alpha1 kind: Component metadata: name: payments-service description: Handles all payment processing and ledger reconciliation annotations: github.com/project-slug: acme-corp/payments-service pagerduty.com/service-id: P1A2B3C datadog/dashboard-url: https://app.datadoghq.com/dashboard/abc-xyz argocd/app-name: payments-service-prod backstage.io/techdocs-ref: dir:. tags: - payments - pci-dss - go links: - url: https://payments.internal.acme.com/metrics title: Metrics Dashboard icon: dashboard spec: type: service lifecycle: production owner: team:payments system: checkout-platform dependsOn: - component:fraud-detection-service - resource:payments-postgres-prod providesApis: - payments-api-v2

Backstage discovers descriptors through catalog providers. The GitHub provider can ingest every catalog-info.yaml across all repositories in an org in minutes. The URL provider handles one-off registrations. At scale, most teams configure auto-discovery so that creating a new repo with a catalog-info.yaml automatically registers the entity within the next sync cycle (default: 5 minutes).

The spec.owner field maps to a Group or User entity. If the referenced owner does not exist in the catalog, Backstage marks the component with an orphaned warning. Keep your Group descriptors in a dedicated org/ repository synced from your IdP (Okta, Azure AD, Google Workspace) via the relevant catalog provider — this is the canonical source of truth for org structure.

The catalog's power multiplies through relations. When a component declares dependsOn, Backstage builds a bidirectional graph. On the component's page, engineers instantly see upstream/downstream dependencies, the owning team's on-call schedule, the last 72 hours of incidents, recent deployments, and open pull requests — all aggregated from plugins reading those annotations. This is the full-context view that eliminates the "where is this thing documented?" question.

Software Templates (Scaffolder)

Templates are the mechanism behind golden paths. An engineer selects a template, fills in a short form, and Backstage creates a repository pre-configured with your company's CI pipeline, Dockerfile, Helm chart, Datadog monitors, PagerDuty service, GitHub branch protection rules, and catalog-info.yaml — all wired up and ready for the first commit. This is sometimes called Day-0 automation.

Templates are themselves YAML descriptors with kind: Template. They declare an input schema (the form fields), a set of steps that the Backstage scaffolder executes server-side, and an output block linking to the created resources.

# template.yaml — Go microservice golden path apiVersion: scaffolder.backstage.io/v1beta3 kind: Template metadata: name: go-microservice title: Go Microservice (Golden Path) description: Spins up a production-ready Go service with CI, Helm, and observability tags: [go, microservice, golden-path] spec: owner: team:platform type: service parameters: - title: Service Details required: [name, description, team] properties: name: type: string title: Service Name pattern: '^[a-z][a-z0-9-]{2,39}$' description: type: string title: Short description team: type: string title: Owning team ui:field: OwnerPicker ui:options: catalogFilter: kind: Group pagerdutyServiceId: type: string title: PagerDuty Service ID (optional) steps: - id: fetch-template name: Fetch skeleton action: fetch:template input: url: ./skeleton values: name: ${{ parameters.name }} description: ${{ parameters.description }} team: ${{ parameters.team }} - id: create-repo name: Create GitHub repo action: github:repo:create input: repoUrl: github.com?owner=acme-corp&repo=${{ parameters.name }} defaultBranch: main repoVisibility: private requiredStatusChecks: - ci/lint - ci/test - ci/build - id: push-content name: Push skeleton to repo action: github:repo:push input: repoUrl: github.com?owner=acme-corp&repo=${{ parameters.name }} defaultBranch: main - id: register name: Register in catalog action: catalog:register input: repoContentsUrl: ${{ steps['create-repo'].output.repoContentsUrl }} catalogInfoPath: /catalog-info.yaml output: links: - title: Repository url: ${{ steps['create-repo'].output.remoteUrl }} - title: Open in catalog entityRef: ${{ steps.register.output.entityRef }}
Keep template skeletons in a separate scaffolder-templates repository, not inside the Backstage app repo. Reference them with absolute URLs (url: https://github.com/acme-corp/scaffolder-templates/tree/main/go-microservice/skeleton). This lets platform teams iterate on templates without redeploying Backstage, and teams can pin to a specific tag for stability.
Backstage Software Template execution flow Developer fills form Scaffolder fetch:template github:repo:create catalog:register GitHub Repo + branch rules, CI PagerDuty service + escalation Catalog entity registered Output Links Repo URL Catalog entry PD service link
Scaffolder execution: one form submission triggers repo creation, PagerDuty provisioning, and catalog registration in sequence.

TechDocs

TechDocs solves the "docs rot in Confluence" problem by treating documentation as code. Engineers write Markdown in the repo (using MkDocs as the generator), CI publishes the rendered HTML to object storage (S3 or GCS), and Backstage serves it inline on the component's Docs tab. The annotation backstage.io/techdocs-ref: dir:. in the catalog descriptor tells Backstage where to find the mkdocs.yml.

# mkdocs.yml — at the repo root alongside catalog-info.yaml site_name: Payments Service site_description: Architecture, runbooks, and on-call guide for payments-service docs_dir: docs/ nav: - Home: index.md - Architecture: architecture.md - Runbooks: - Incident Response: runbooks/incident-response.md - Database Failover: runbooks/db-failover.md - ADRs: adrs/index.md - API Reference: api-reference.md plugins: - techdocs-core # injected by the Backstage TechDocs builder image # In CI (GitHub Actions), publish on every merge to main: # - run: pip install mkdocs-techdocs-core # - run: mkdocs build --config-file mkdocs.yml # - run: aws s3 sync site/ s3://acme-techdocs/payments-service/ --delete

The TechDocs builder can run in two modes: local (Backstage builds docs on-demand) or external (CI builds and publishes to storage). Production deployments must use external mode — local mode blocks the Backstage Node process and cannot scale.

TechDocs in local build mode is a common production foot-gun. For large orgs with hundreds of services, on-demand builds cause Backstage pod OOM kills and 30-second first-load latencies. Always configure external build mode with CI publishing to GCS or S3, and set techdocs.builder: 'external' in app-config.yaml. Pair this with a CDN in front of the storage bucket to cut doc load times from ~800 ms to under 100 ms.

Production Deployment Considerations

At scale, Backstage is a Node.js application that can be resource-hungry. Key production settings to tune:

  • Catalog refresh interval: default 5 minutes; increase to 15–30 minutes for orgs with 5,000+ entities to reduce GitHub API rate-limit pressure.
  • Database: replace the default in-memory store with PostgreSQL (required for any deployment beyond a single pod).
  • Authentication: integrate with your IdP (Okta, GitHub OAuth, Azure AD) using Backstage's auth backend. Guest access is disabled in production.
  • Plugin isolation: each Backstage plugin runs in the same Node process. A misbehaving plugin (e.g., one with an unhandled promise rejection) can crash the entire app. Pin plugin versions and use liveness/readiness probes to catch and restart quickly.
Treat Backstage itself as a product with SLOs. Track catalog completeness (what percentage of prod services have a catalog-info.yaml), template adoption (what percentage of new repos were created via a golden-path template), and TechDocs coverage (services with published docs). These three metrics are the most actionable indicators of IDP health and developer experience ROI.

The Software Catalog, Software Templates, and TechDocs are the foundation of everything else in Backstage. Once entities are registered and docs are live, every other plugin — cost visibility, security posture, deployment frequency, on-call load — simply reads entity annotations and enriches the same single pane of glass. The compounding effect is what justifies the infrastructure investment.