Logging at Scale: ELK & Loki

Grafana Loki

18 min Lesson 6 of 28

Grafana Loki

Elasticsearch indexes every token in every log line. That brute-force approach is powerful but expensive: a 1 TB/day log estate typically needs 3–5x that in Elasticsearch storage (indexes, replicas, segment merges) and a cluster that costs thousands of dollars a month just to keep warm. Grafana Loki was designed by the Grafana Labs team to answer a different question: what if you stored logs the way Prometheus stores metrics? Index only the labels, compress the raw text, and query lazily at read time. The result is dramatically lower cost with a trade-off in raw query speed that is almost never the bottleneck for real incident investigation.

This lesson takes you deep into Loki's architecture, its query language LogQL, and the operational situations where Loki wins — so you can choose the right tool and run it correctly in production.

Label-Based Indexing vs Full-Text Indexing

In Elasticsearch, every log line is parsed at ingest time: every word, every number, every field becomes an entry in an inverted index. This makes arbitrary full-text search instant, but it also means the index can easily be larger than the raw data. CPU burns at ingest to tokenize and segment-merge; heap burns at query time to load posting lists.

Loki takes the opposite approach. At ingest time it extracts only the labels you declare — fixed key-value pairs attached by the log shipper, such as app="api-gateway", env="production", namespace="payments". These labels define a stream. Within each stream, log lines are bundled into compressed chunks (Snappy or LZ4 by default) and written to object storage (S3, GCS, Azure Blob). The label index lives in a tiny store — originally Cassandra or BoltDB, now the embedded TSDB index introduced in Loki 2.8. At query time, Loki first identifies matching streams by label, then decompresses only the relevant chunks and applies any filter expressions against the raw text in memory. Think of it as a label-gated grep over compressed logs.

The key mental model: Loki labels are for selecting which streams to read; LogQL filter expressions (|=, |~, parser pipelines) are for finding what inside those streams. If your query touches too many streams, it is slow. If a stream has too many lines per second, chunks grow large and decompression becomes the bottleneck — both are label design problems.

Loki Architecture

A production Loki deployment runs in microservices mode with independently scalable components:

Distributor — receives push requests from Promtail/Alloy/Fluent Bit via the /loki/api/v1/push HTTP endpoint. Validates labels, enforces rate limits, and fans out to Ingesters via a consistent-hash ring.
Ingester — holds the in-memory write-ahead log (WAL) and open chunks. Flushes sealed chunks to object storage and writes the index to the TSDB store. Scale horizontally; use a WAL on local SSD for durability.
Querier — handles /loki/api/v1/query_range requests. It fetches data from both the in-memory Ingesters (recent data) and object storage (older data), merges results, and applies LogQL pipelines.
Query Frontend — shards long time-range queries, caches results, and queues requests for fairness. Always deploy this in front of Queriers at scale.
Ruler — evaluates metric-type LogQL rules (alerting and recording) on a schedule, identical in concept to Prometheus rules.
Compactor — periodically merges small TSDB index shards and enforces retention via delete markers on object storage.

Loki write path (left) and read path (right): labels index in TSDB, raw chunks in object storage.

LogQL — Loki's Query Language

LogQL is deliberately modeled after PromQL. Every query starts with a log stream selector — a set of label matchers in curly braces — followed by optional pipeline stages that filter and transform log lines.

# --- Log stream selector + filter ---
# Tail the last 5 minutes of errors from the payments service
{namespace="payments", app="checkout"} |= "error" | logfmt | level="error"

# --- Regex filter ---
{app="nginx"} |~ "5[0-9]{2}" | json | status >= 500

# --- Parser pipeline: parse JSON, then filter on a field ---
{app="api-gateway", env="production"}
  | json
  | duration_ms > 1000
  | line_format "{{.method}} {{.path}} {{.status}} {{.duration_ms}}ms"

# --- Metric query: request rate by status code ---
sum by (status) (
  rate({app="api-gateway", env="production"} | json | unwrap duration_ms [5m])
)

# --- Count errors per minute for an alert rule ---
sum(rate({namespace="payments"} |= "Exception" [1m])) > 10

The key stages in a pipeline are:

| json / | logfmt / | pattern / | regexp — parse the raw line into fields
|= "string" / != "string" / |~ "regex" / !~ "regex" — line filters (fast; applied before decompression in newer Loki builds)
| field_name = "value" — label filters on extracted fields
| line_format "template" — reshape the output line using Go template syntax
| unwrap field_name — extract a numeric field for metric aggregation (rate, avg_over_time, quantile_over_time)

Always put the most selective label first. Loki evaluates stream selectors before parsing, so {app="checkout", env="production"} scans far fewer chunks than {env="production"} alone. A query touching more than ~200 streams at once will be slow regardless of how efficient the filter pipeline is.

Label Design: The Most Important Operational Decision

Labels in Loki behave identically to labels in Prometheus: every unique combination of label values creates a new stream. Too many streams — high cardinality — destroys Loki's cost advantage because the TSDB index explodes and chunk files become tiny (low compression ratio). The canonical antipatterns are:

Using a user_id, request_id, or trace_id as a label. These belong inside the log line, extracted at query time with a parser pipeline.
Using the pod name or deployment hash as a label in Kubernetes — use pod sparingly and prefer app / component.
Encoding severity as a label (e.g., level="error") when the field is already inside the JSON payload — parse it with | json | level="error" at query time instead.

A safe starting set for a Kubernetes environment is: cluster, namespace, app, env. Everything else — hostname, pod name, log level, trace ID — lives in the log line and is extracted via parser pipelines.

Deploying Loki with Promtail on Kubernetes

# Install Loki stack via Helm (single-binary mode for dev; use microservices for prod)
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm upgrade --install loki grafana/loki \
  --namespace monitoring --create-namespace \
  --set loki.commonConfig.replication_factor=1 \
  --set loki.storage.type=s3 \
  --set loki.storage.s3.bucketnames=my-loki-chunks \
  --set loki.storage.s3.region=us-east-1 \
  --set singleBinary.replicas=1

# Install Promtail as a DaemonSet — ships node and pod logs to Loki
helm upgrade --install promtail grafana/promtail \
  --namespace monitoring \
  --set config.lokiAddress=http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push

# loki-values.yaml snippet for production (microservices mode)
loki:
  auth_enabled: true            # multi-tenancy via X-Scope-OrgID header
  limits_config:
    ingestion_rate_mb: 32       # per-tenant MB/s write limit
    ingestion_burst_size_mb: 64
    max_streams_per_user: 10000 # cardinality guard
    max_query_parallelism: 32
    retention_period: 30d       # enforced by Compactor
  storage_config:
    tsdb_shipper:
      active_index_directory: /loki/tsdb-index
      cache_location: /loki/tsdb-cache
    aws:
      s3: s3://us-east-1/my-loki-chunks
      s3forcepathstyle: false
  compactor:
    working_directory: /loki/compactor
    retention_enabled: true
    delete_request_store: s3

When Loki Wins (and When It Does Not)

Loki wins when:

You already run Grafana and Prometheus and want a unified observability stack without a separate Elasticsearch team.
Log volume is high (hundreds of GB/day) and cost is a constraint — S3 storage is 10–30x cheaper than Elasticsearch EBS volumes per GB.
Queries follow a known pattern: "show me logs from this service in this namespace during this incident window." Label-first retrieval is fast for this workload.
You need to correlate logs with metrics and traces in a single Grafana dashboard — Loki's Explore view links directly to Tempo traces via traceID fields.

Loki struggles when:

You need sub-second arbitrary full-text search across all logs simultaneously (compliance tooling, forensic investigation of unknown patterns). Elasticsearch is faster here.
Your team's workflow is built around Kibana's UI — LogQL has a learning curve and Grafana Explore is not a drop-in replacement.
You have unstructured, inconsistently formatted logs from legacy systems where parsing is unreliable — the full-text index of Elasticsearch is more forgiving.

Production pitfall — retention not enforced: Loki does not delete old chunks automatically unless retention_enabled: true is set in the Compactor config AND the Compactor component is actually running. Many teams discover after 90 days that their S3 bucket has grown without bound. Always verify with aws s3 ls s3://my-loki-chunks --recursive --summarize | tail -2 after a week of operation.

Alerting with Loki Ruler

The Loki Ruler evaluates LogQL metric expressions on a schedule and fires Prometheus-compatible alerts. Define rules in the same YAML format as Prometheus:

# loki-rules.yaml — alert when error rate exceeds threshold
groups:
  - name: application_errors
    rules:
      - alert: HighErrorRate
        expr: |
          sum by (namespace, app) (
            rate({env="production"} | json | level="error" [5m])
          ) > 1
        for: 2m
        labels:
          severity: warning
          team: platform
        annotations:
          summary: "High error rate in {{ $labels.namespace }}/{{ $labels.app }}"
          description: "Error rate is {{ $value | humanize }} errors/s over the last 5 minutes."

      - alert: PaymentServiceDown
        expr: |
          absent(rate({app="checkout", env="production"}[1m])) > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "No logs received from checkout service"

Rules are loaded via the Ruler API or stored in object storage (S3 prefix). Alerts route through Alertmanager, identical to Prometheus. This lets you maintain a single alert routing tree for both metric and log-based alerts.