Secrets Scanning & Leak Response
Secrets Scanning & Leak Response
A hardcoded AWS access key, a Stripe secret, a database password committed to a public repository — these are the most common, most preventable, and most damaging categories of security incident in modern software delivery. When the Uber breach of 2022 was traced back to credentials stored in a private GitHub repository, and when thousands of GitHub repositories are automatically scraped by bots within seconds of a push, the question is not whether to scan for secrets — it is how many layers of defense you run.
This lesson covers the full lifecycle: blocking secrets before they ever enter version control, catching anything that slips through in CI, alerting on leaks in existing history, and executing the incident response runbook when a key does escape into the wild.
Layer 1: Pre-Commit Hooks with detect-secrets and Gitleaks
The cheapest time to catch a secret is before the commit is created. Gitleaks and detect-secrets are the two dominant tools, and most mature teams run both — Gitleaks for its speed and broad regex coverage, detect-secrets for its entropy-based detection and baseline audit workflow.
The canonical setup uses pre-commit (the Python framework) to orchestrate hooks from a single .pre-commit-config.yaml in the repository root:
Install and activate with pip install pre-commit && pre-commit install. On macOS developer machines you can also distribute via Homebrew. For repositories where you cannot guarantee every contributor runs pre-commit install, the hook is a speed bump, not a guarantee — which is why CI scanning is mandatory.
Gitleaks reads a .gitleaks.toml config for allowlists and custom rules. The most important tuning is a project-specific allowlist so that test fixtures with fake credentials (e.g., EXAMPLE_KEY=AKIAIOSFODNN7EXAMPLE) do not generate noise:
detect-secrets baseline first (detect-secrets scan > .secrets.baseline) so that pre-existing issues in the baseline do not block all commits — only net-new secrets trigger failures. Schedule a separate audit job to work down the baseline backlog.
Layer 2: CI Pipeline Scanning
CI is the mandatory enforcement gate. Even if every developer dutifully runs pre-commit, CI scanning catches force-pushes, web editor commits, and GitHub Actions bot commits that bypass local hooks. Run Gitleaks on the full diff for every pull request, and on every push to protected branches. The --source flag scans only the commits introduced in the PR, keeping scan time under five seconds for most repositories.
GitHub's own Secret Scanning feature (available on all public repositories and GitHub Advanced Security) runs continuously in the background and sends push-protection alerts before a commit lands on the default branch. Enable it under Settings → Code security → Secret scanning → Push protection. It covers 200+ partner token patterns (AWS, GCP, Stripe, Twilio, Slack, etc.) with near-zero false-positive rates because the patterns are validated against the provider's live API.
Layer 3: Scanning Existing History
When you onboard an existing repository, you must scan its entire commit history — not just the current HEAD. Secrets deleted in a later commit are still in the git object store, visible to anyone who clones the repository. Use gitleaks detect --source . --log-opts="--all" or truffleHog for deep history scans. This is a one-time audit job; the results feed a remediation backlog tracked in your security system (Jira, GitHub Issues, or a dedicated secret management dashboard).
Incident Response: The Five-Step Runbook
When a secret leaks — detected by a scanner, reported by a bug bounty researcher, or discovered in a breach notification — speed is everything. Bots scan GitHub in real time and have been measured consuming newly pushed AWS keys within four seconds of exposure. Every minute of delay expands the blast radius.
- Revoke immediately. Do not wait for confirmation of exploitation. Go to the credential provider's console (AWS IAM, GCP IAM, Stripe Dashboard, GitHub Settings) and deactivate or delete the key. If the key controls infrastructure, accept the brief outage — it is less damaging than a breach. For AWS keys:
aws iam delete-access-key --access-key-id AKIA.... For GitHub tokens: revoke in Settings → Developer settings → Personal access tokens. - Rotate and issue a replacement. Create a new credential immediately and inject it into your secrets manager (HashiCorp Vault, AWS Secrets Manager, or GCP Secret Manager). Update all systems consuming the old key. Never reuse the compromised key material.
- Rewrite git history (or accept the exposure). Removing a secret from the current commit does NOT remove it from git history — anyone who cloned or cached the repo already has it. Use BFG Repo-Cleaner or
git filter-repoto rewrite history, then force-push and notify all collaborators to re-clone. This is disruptive on active repositories; weigh it against the risk profile of the leaked credential. - Audit access logs. Pull CloudTrail logs (AWS), Stackdriver audit logs (GCP), or equivalent. Look for API calls made with the compromised key from unexpected IPs, regions, or at unusual times. This determines whether exploitation occurred and what resources were accessed.
- Root cause and systemic fix. Write a blameless post-mortem: how did the secret get committed? Was a pre-commit hook missing? Was a developer bypassing hooks with
--no-verify? Was a CI secret accidentally echoed to logs? Address the systemic gap — not just this instance.
git commit --amend or soft-reset to "remove" a secret. The original commit object remains in the git object store and is reachable by anyone with a clone or a GitHub cache until a complete history rewrite. Assume the secret is fully public the moment it was pushed.
Preventing Secrets at the Source: Vault Integration
Scanning catches secrets that slip through. The long-term fix is making it impossible to hardcode secrets because they never exist as plaintext in the developer's environment. This means every application reads credentials from a secrets manager at runtime — not from .env files, not from environment variables baked into container images, and certainly not from source code.
In a Kubernetes environment, the canonical pattern is External Secrets Operator (ESO), which syncs secrets from Vault or AWS Secrets Manager into Kubernetes Secret objects that the pod consumes as environment variables or mounted files. The secret material never passes through the CI pipeline or the container image build.
Secrets scanning is not a one-time audit — it is a continuously running control. The organizations that handle secrets well treat every credential as ephemeral: short-lived, automatically rotated, and never human-readable in production. The scanning layer exists not because the vault pattern failed, but because humans inevitably take shortcuts during incidents, debugging sessions, and local development. Your job is to make those shortcuts visible and correctable before they become breaches.