Platform Engineering & Developer Experience

Backstage & Service Catalogs

18 min Lesson 3 of 28

Backstage & Service Catalogs

Spotify open-sourced Backstage in 2020 after building it internally to tame 2,000+ microservices, 1,600+ engineers, and hundreds of infrastructure components that had no single pane of glass. The platform became the de-facto standard for Internal Developer Platforms (IDPs) and graduated as a CNCF Incubating project in 2022. At its core, Backstage is three loosely coupled pillars: the Software Catalog, Software Templates, and TechDocs. This lesson covers exactly those three pillars at the depth required to operate them in production.

The Software Catalog

The catalog is a living registry of every entity your organization owns: services, libraries, websites, pipelines, APIs, resources (S3 buckets, RDS clusters), systems, and domains. Each entity is described by a YAML file — called a catalog descriptor — that lives alongside the code it describes.

Every descriptor follows a common schema with apiVersion, kind, metadata, and spec. The metadata.annotations block is where Backstage plugins read their configuration — PagerDuty service IDs, Datadog dashboard links, GitHub Actions workflow paths, ArgoCD app names, and so on.

# catalog-info.yaml — place at the root of any repo
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payments-service
  description: Handles all payment processing and ledger reconciliation
  annotations:
    github.com/project-slug: acme-corp/payments-service
    pagerduty.com/service-id: P1A2B3C
    datadog/dashboard-url: https://app.datadoghq.com/dashboard/abc-xyz
    argocd/app-name: payments-service-prod
    backstage.io/techdocs-ref: dir:.
  tags:
    - payments
    - pci-dss
    - go
  links:
    - url: https://payments.internal.acme.com/metrics
      title: Metrics Dashboard
      icon: dashboard
spec:
  type: service
  lifecycle: production
  owner: team:payments
  system: checkout-platform
  dependsOn:
    - component:fraud-detection-service
    - resource:payments-postgres-prod
  providesApis:
    - payments-api-v2

Backstage discovers descriptors through catalog providers. The GitHub provider can ingest every catalog-info.yaml across all repositories in an org in minutes. The URL provider handles one-off registrations. At scale, most teams configure auto-discovery so that creating a new repo with a catalog-info.yaml automatically registers the entity within the next sync cycle (default: 5 minutes).

The spec.owner field maps to a Group or User entity. If the referenced owner does not exist in the catalog, Backstage marks the component with an orphaned warning. Keep your Group descriptors in a dedicated org/ repository synced from your IdP (Okta, Azure AD, Google Workspace) via the relevant catalog provider — this is the canonical source of truth for org structure.

The catalog's power multiplies through relations. When a component declares dependsOn, Backstage builds a bidirectional graph. On the component's page, engineers instantly see upstream/downstream dependencies, the owning team's on-call schedule, the last 72 hours of incidents, recent deployments, and open pull requests — all aggregated from plugins reading those annotations. This is the full-context view that eliminates the "where is this thing documented?" question.

Software Templates (Scaffolder)

Templates are the mechanism behind golden paths. An engineer selects a template, fills in a short form, and Backstage creates a repository pre-configured with your company's CI pipeline, Dockerfile, Helm chart, Datadog monitors, PagerDuty service, GitHub branch protection rules, and catalog-info.yaml — all wired up and ready for the first commit. This is sometimes called Day-0 automation.

Templates are themselves YAML descriptors with kind: Template. They declare an input schema (the form fields), a set of steps that the Backstage scaffolder executes server-side, and an output block linking to the created resources.

# template.yaml — Go microservice golden path
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: go-microservice
  title: Go Microservice (Golden Path)
  description: Spins up a production-ready Go service with CI, Helm, and observability
  tags: [go, microservice, golden-path]
spec:
  owner: team:platform
  type: service

  parameters:
    - title: Service Details
      required: [name, description, team]
      properties:
        name:
          type: string
          title: Service Name
          pattern: '^[a-z][a-z0-9-]{2,39}$'
        description:
          type: string
          title: Short description
        team:
          type: string
          title: Owning team
          ui:field: OwnerPicker
          ui:options:
            catalogFilter:
              kind: Group
        pagerdutyServiceId:
          type: string
          title: PagerDuty Service ID (optional)

  steps:
    - id: fetch-template
      name: Fetch skeleton
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          description: ${{ parameters.description }}
          team: ${{ parameters.team }}

    - id: create-repo
      name: Create GitHub repo
      action: github:repo:create
      input:
        repoUrl: github.com?owner=acme-corp&repo=${{ parameters.name }}
        defaultBranch: main
        repoVisibility: private
        requiredStatusChecks:
          - ci/lint
          - ci/test
          - ci/build

    - id: push-content
      name: Push skeleton to repo
      action: github:repo:push
      input:
        repoUrl: github.com?owner=acme-corp&repo=${{ parameters.name }}
        defaultBranch: main

    - id: register
      name: Register in catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps['create-repo'].output.repoContentsUrl }}
        catalogInfoPath: /catalog-info.yaml

  output:
    links:
      - title: Repository
        url: ${{ steps['create-repo'].output.remoteUrl }}
      - title: Open in catalog
        entityRef: ${{ steps.register.output.entityRef }}

Keep template skeletons in a separate scaffolder-templates repository, not inside the Backstage app repo. Reference them with absolute URLs (url: https://github.com/acme-corp/scaffolder-templates/tree/main/go-microservice/skeleton). This lets platform teams iterate on templates without redeploying Backstage, and teams can pin to a specific tag for stability.

Scaffolder execution: one form submission triggers repo creation, PagerDuty provisioning, and catalog registration in sequence.

TechDocs

TechDocs solves the "docs rot in Confluence" problem by treating documentation as code. Engineers write Markdown in the repo (using MkDocs as the generator), CI publishes the rendered HTML to object storage (S3 or GCS), and Backstage serves it inline on the component's Docs tab. The annotation backstage.io/techdocs-ref: dir:. in the catalog descriptor tells Backstage where to find the mkdocs.yml.

# mkdocs.yml — at the repo root alongside catalog-info.yaml
site_name: Payments Service
site_description: Architecture, runbooks, and on-call guide for payments-service
docs_dir: docs/

nav:
  - Home: index.md
  - Architecture: architecture.md
  - Runbooks:
      - Incident Response: runbooks/incident-response.md
      - Database Failover: runbooks/db-failover.md
  - ADRs: adrs/index.md
  - API Reference: api-reference.md

plugins:
  - techdocs-core    # injected by the Backstage TechDocs builder image

# In CI (GitHub Actions), publish on every merge to main:
# - run: pip install mkdocs-techdocs-core
# - run: mkdocs build --config-file mkdocs.yml
# - run: aws s3 sync site/ s3://acme-techdocs/payments-service/ --delete

The TechDocs builder can run in two modes: local (Backstage builds docs on-demand) or external (CI builds and publishes to storage). Production deployments must use external mode — local mode blocks the Backstage Node process and cannot scale.

TechDocs in local build mode is a common production foot-gun. For large orgs with hundreds of services, on-demand builds cause Backstage pod OOM kills and 30-second first-load latencies. Always configure external build mode with CI publishing to GCS or S3, and set techdocs.builder: 'external' in app-config.yaml. Pair this with a CDN in front of the storage bucket to cut doc load times from ~800 ms to under 100 ms.

Production Deployment Considerations

At scale, Backstage is a Node.js application that can be resource-hungry. Key production settings to tune:

Catalog refresh interval: default 5 minutes; increase to 15–30 minutes for orgs with 5,000+ entities to reduce GitHub API rate-limit pressure.
Database: replace the default in-memory store with PostgreSQL (required for any deployment beyond a single pod).
Authentication: integrate with your IdP (Okta, GitHub OAuth, Azure AD) using Backstage's auth backend. Guest access is disabled in production.
Plugin isolation: each Backstage plugin runs in the same Node process. A misbehaving plugin (e.g., one with an unhandled promise rejection) can crash the entire app. Pin plugin versions and use liveness/readiness probes to catch and restart quickly.

Treat Backstage itself as a product with SLOs. Track catalog completeness (what percentage of prod services have a catalog-info.yaml), template adoption (what percentage of new repos were created via a golden-path template), and TechDocs coverage (services with published docs). These three metrics are the most actionable indicators of IDP health and developer experience ROI.

The Software Catalog, Software Templates, and TechDocs are the foundation of everything else in Backstage. Once entities are registered and docs are live, every other plugin — cost visibility, security posture, deployment frequency, on-call load — simply reads entity annotations and enriches the same single pane of glass. The compounding effect is what justifies the infrastructure investment.