FinOps & Cloud Cost Optimization

Architecting for Cost

18 min Lesson 8 of 26

Architecting for Cost

Every architectural decision carries a price tag that compounds over the lifetime of the system. A senior engineer who chooses a synchronous cross-region call over an async queue, or stores warm analytics data in S3 Standard instead of Intelligent-Tiering, is making a cost decision — usually without realising it. At $5M/month of cloud spend, architecture-driven waste routinely accounts for 20–40% of the bill, dwarfing the gains from instance right-sizing and reservation coverage. This lesson covers the three highest-leverage architectural levers: egress-aware design, storage tiering, and serverless economics.

Egress-Aware Design

Data transfer fees are the most widely underestimated line on a cloud bill. AWS charges nothing for ingress, but charges $0.09/GB for data leaving a region to the internet, $0.02/GB for cross-region transfer, and $0.01/GB for cross-AZ transfer in both directions. GCP and Azure follow similar structures. These numbers look small until you run the math: a microservices architecture producing 50 TB/day of inter-service traffic crossing AZ boundaries costs roughly $15,000/month in transfer fees alone.

Co-locate data and compute in the same AZ. An EC2 instance reading from an RDS replica in the same AZ pays nothing. The same read crossing an AZ boundary costs $0.01/GB each direction. Use AZ-affinity routing (Kubernetes topologyKey: topology.kubernetes.io/zone) to keep hot paths local.
Replace NAT gateway with VPC endpoints. S3 and DynamoDB have free Gateway Endpoints that route traffic privately inside AWS — no NAT gateway charge ($0.045/GB processed). A single NAT gateway handling 100 TB/month of S3 traffic costs $4,500/month; the same traffic over a Gateway Endpoint costs $0.
Push data to the edge, not the origin. CloudFront cache hit rates of 80–95% mean the origin never serves that data. A service streaming 1 PB/month from S3 direct costs ~$23,000 in egress; through CloudFront at 85% hit rate it costs ~$8,500.
Shrink inter-service payloads. Fanout patterns — one event triggering 10 downstream calls each returning 200 KB of JSON — multiply egress silently. Use Protobuf or Avro for internal communication (5–10x smaller than JSON) and return projected fields rather than full resource documents.

Cross-region replication is an egress trap. S3 Cross-Region Replication, DynamoDB Global Tables, and database read replicas in a second region all incur continuous replication egress charges. Use replication only when RTO/RPO requirements genuinely demand it; a daily snapshot to a cold bucket in a second region is often sufficient and costs 1/100th as much.

Same-AZ reads and VPC Gateway Endpoints are free; cross-AZ calls and NAT Gateway paths carry per-GB fees that accumulate rapidly at scale.

# Terraform: S3 + DynamoDB Gateway Endpoints — eliminate NAT processing charges
# vpc_endpoints.tf

resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = concat(
    aws_route_table.private[*].id,
    aws_route_table.public[*].id
  )
  tags = { Name = "s3-gateway-endpoint" }
}

resource "aws_vpc_endpoint" "dynamodb" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.dynamodb"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = aws_route_table.private[*].id
  tags = { Name = "dynamodb-gateway-endpoint" }
}

# Kubernetes AZ-affinity: keep pod-to-pod traffic within the same AZ
# topology-spread.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  template:
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api-server

Storage Tiering

Object storage is priced by tier, and most teams use exactly one tier — Standard — for everything. The result is significant overpayment for data that is accessed rarely or never. AWS S3 storage pricing ranges from $0.023/GB/month (Standard) down to $0.004/GB/month (Glacier Deep Archive) — a 6x spread. A 1 PB dataset that has not been accessed in 180 days but sits in Standard costs $23,000/month unnecessarily.

The authoritative tiers and their access patterns:

S3 Standard: $0.023/GB. Active data accessed multiple times per month. Never expire from here automatically — the data is being used.
S3 Intelligent-Tiering: $0.023/GB + $0.0025/1,000 objects monitoring fee. Automatically moves objects between frequent-access and infrequent-access tiers based on 30-day access patterns. Use this as the default for any data whose access pattern is uncertain. The monitoring fee is negligible above 128 KB object size.
S3 Standard-IA (Infrequent Access): $0.0125/GB storage but $0.01/GB retrieval. Right for backups and disaster-recovery data accessed a handful of times per year. Wrong for data accessed daily — the retrieval fee exceeds the storage savings.
S3 Glacier Instant Retrieval: $0.004/GB. Millisecond retrieval latency. Right for compliance archives accessed once a quarter.
S3 Glacier Deep Archive: $0.00099/GB. 12-hour retrieval. Right for regulatory archives that must be retained for 7 years and will almost never be accessed.

S3 Lifecycle rules are the highest ROI storage action. A single lifecycle policy that moves objects to Intelligent-Tiering after 30 days and to Glacier Deep Archive after 365 days can cut storage costs by 60–70% on log and telemetry buckets with no application changes. At 100 TB of log data, that is $1,600/month saved for one Terraform resource block.

# Terraform: S3 Lifecycle policy — tiered storage for logs/telemetry
# s3_lifecycle.tf

resource "aws_s3_bucket_lifecycle_configuration" "logs" {
  bucket = aws_s3_bucket.logs.id

  rule {
    id     = "logs-tiering"
    status = "Enabled"

    filter { prefix = "logs/" }

    transition {
      days          = 30
      storage_class = "INTELLIGENT_TIERING"
    }

    transition {
      days          = 90
      storage_class = "GLACIER_IR"
    }

    transition {
      days          = 365
      storage_class = "DEEP_ARCHIVE"
    }

    expiration {
      days = 2555   # 7-year retention; adjust per compliance policy
    }
  }
}

# S3 Intelligent-Tiering Archive configuration (optional deeper tier at 90/180 days)
resource "aws_s3_bucket_intelligent_tiering_configuration" "archive" {
  bucket = aws_s3_bucket.app_data.id
  name   = "EntireS3Bucket"

  tiering {
    access_tier = "ARCHIVE_ACCESS"
    days        = 90
  }

  tiering {
    access_tier = "DEEP_ARCHIVE_ACCESS"
    days        = 180
  }
}

# Audit current storage class distribution in a bucket
aws s3api list-objects-v2 \
  --bucket my-logs-bucket \
  --query 'Contents[].{Key:Key,StorageClass:StorageClass,Size:Size}' \
  --output json | \
  jq 'group_by(.StorageClass) | map({class: .[0].StorageClass, count: length, total_gb: (map(.Size) | add / 1073741824 | round)})'

Beyond object storage, apply the same tiering mindset to block and database storage. EBS gp3 is 20% cheaper than gp2 with identical IOPS at baseline — there is no reason to use gp2 for new volumes. RDS and Aurora offer storage auto-scaling that prevents over-provisioning. ElastiCache Redis clusters should use tiered node types: a cache.r7g.large for the hot working set backed by S3 or DynamoDB for the cold tail is almost always cheaper than a cluster sized for the full dataset.

Serverless Economics

Serverless (Lambda, Cloud Functions, Azure Functions) and managed containers (Fargate, Cloud Run) turn the cost model from capacity to consumption. This is a fundamentally different economic structure, and it cuts both ways: at low and spiky utilisation serverless is dramatically cheaper than reserved instances; at high sustained utilisation it becomes dramatically more expensive.

The break-even point for Lambda vs. a reserved EC2 instance is roughly 20–30% utilisation. Below that threshold, Lambda wins on cost. Above it, a Reserved Instance or Savings Plan wins. The critical analysis questions are:

What is the p50 and p99 invocation rate over a 24-hour period? Workloads with a 10x day/night ratio are classic Lambda candidates; workloads with flat 24/7 traffic are not.
What is the function duration? Lambda charges per GB-second of execution. A function using 512 MB for 200 ms costs $0.000001667 per invocation. At 1 billion invocations/month (a large-scale API), that is $1,667/month — versus a 10-node ECS Fargate cluster for the same workload at ~$800/month. Duration discipline matters: trim unused memory allocations, and measure actual memory usage with Lambda Power Tuning rather than guessing.
What does cold-start latency cost you in user experience? Lambda cold starts range from 100 ms (Python/Node) to 1–3 seconds (JVM with large classpaths). Provisioned Concurrency eliminates cold starts but at a cost of ~$0.015/hour per concurrency unit — use it only for latency-critical paths.

Lambda Power Tuning is mandatory before production. The AWS Lambda Power Tuning Step Functions state machine tests your function across memory configurations from 128 MB to 10 GB and plots cost vs. duration. In practice the optimal memory setting is almost never the default 128 MB — a function that runs in 1,400 ms at 128 MB often runs in 300 ms at 512 MB, at 20% lower total cost because execution time dropped 4x while the per-GB-second rate only doubled.

# Deploy Lambda Power Tuning (AWS SAR one-click deploy)
aws serverlessrepo create-cloud-formation-change-set \
  --application-id arn:aws:serverlessrepo:us-east-1:451282441545:applications/aws-lambda-power-tuning \
  --semantic-version 4.3.3 \
  --stack-name lambda-power-tuning \
  --parameter-overrides '[{"Name":"lambdaResource","Value":"*"}]' \
  --capabilities CAPABILITY_IAM

# Run the state machine to tune a Lambda function
aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-east-1:ACCOUNT_ID:stateMachine:powerTuningStateMachine \
  --input '{
    "lambdaARN": "arn:aws:lambda:us-east-1:ACCOUNT_ID:function:my-api-handler",
    "powerValues": [128, 256, 512, 1024, 2048, 3008],
    "num": 50,
    "payload": {},
    "parallelInvocation": true,
    "strategy": "cost"
  }'

# Terraform: Lambda with right-sized memory + Provisioned Concurrency only on peak hours
resource "aws_lambda_function" "api" {
  function_name = "api-handler"
  runtime       = "python3.12"
  memory_size   = 512    # tuned from Power Tuning result
  timeout       = 15

  environment {
    variables = {
      LOG_LEVEL = "WARNING"   # verbose logging is a hidden cost: CloudWatch Logs charges $0.50/GB ingested
    }
  }
}

resource "aws_lambda_provisioned_concurrency_config" "peak" {
  function_name               = aws_lambda_function.api.function_name
  qualifier                   = aws_lambda_alias.live.name
  provisioned_concurrent_executions = 10   # cover p99 cold-start-sensitive traffic only
}

# CloudWatch Logs cost reduction — 1-day retention for Lambda dev, 14-day for prod
resource "aws_cloudwatch_log_group" "lambda" {
  name              = "/aws/lambda/${aws_lambda_function.api.function_name}"
  retention_in_days = 14   # default is NEVER expire — a frequent runaway cost
}

For sustained workloads migrating off Lambda, Fargate Spot offers up to 70% savings over standard Fargate while remaining fully serverless (no instance management). Combine it with AWS Graviton2/3 task definitions (ARM64): Graviton tasks on Fargate are 20% cheaper than x86 at identical vCPU and memory, and typically 15–30% faster for CPU-bound workloads. A task running 1 vCPU / 2 GB on Graviton Fargate Spot costs roughly $0.008/hour — two-thirds the price of the same task on standard Fargate x86.

Synthesis: Architectural Cost Review Checklist

Before any significant architecture review or pre-production readiness check, run through these questions:

Does every cross-service call cross an AZ or region boundary? Is that justified by resilience requirements, or is it accidental topology?
Is there a NAT gateway in the path for traffic that could use a VPC endpoint instead?
Is any data stored in S3 Standard that has not been accessed in 30 days? Is there a lifecycle rule?
Are Lambda functions memory-profiled? Are log groups set to expire?
Is serverless being used for sustained-high-throughput workloads where reserved compute would be cheaper?
Are CDN cache hit rates being measured? Is the cache TTL set deliberately or left at framework default?

Put cost in the design doc template. A one-paragraph cost estimate — "this design will generate approximately X GB/day of cross-AZ traffic at $Y/month and store Z TB in S3 Standard costing $W/month" — written at design time costs five minutes and prevents months of overspend. Make it a required section in your RFC or design doc template.