Serverless & Event-Driven Operations

Containers Meet Serverless

18 min Lesson 9 of 28

Containers Meet Serverless

Pure function-as-a-service works brilliantly for stateless, short-lived compute — but it breaks down when you need a custom OS layer, a binary too large for a Lambda deployment package, a runtime that Lambda does not support, or workloads that need hundreds of megabytes of ML model weights. Serverless containers solve this by combining the packaging model of containers with the operational model of serverless: you never provision a virtual machine or manage a cluster node, but you run arbitrary container images. AWS Fargate and Google Cloud Run are the two dominant platforms, and understanding their architecture at the control-plane level is what separates engineers who can make principled capacity and cost decisions from those who just follow defaults.

Fargate: Serverless Containers on AWS

Fargate is not a standalone service — it is a launch type for both Amazon ECS and Amazon EKS. When you run a task on Fargate, the ECS or EKS control plane provisions a microVM (via the Firecracker hypervisor, the same technology Lambda uses internally) per task, injects your container runtime into it, and attaches a network interface to your VPC. You get full container isolation at hardware-VM granularity, sub-second scheduling jitter on a warm data-plane node, and the complete ECS or EKS API surface without managing any node.

Fargate task sizing is coarser than Lambda: CPU is specified in 256 CPU unit (0.25 vCPU) increments up to 16 vCPU, and memory in 512 MiB increments up to 120 GB. You pay per vCPU-second and GB-second while the task is running — there is no idle cost, but there is also no free tier after the first 12 months. A task running 30 seconds costs the same per-second rate as one running 30 minutes, so Fargate is naturally cheaper than EC2 for bursty, unpredictable workloads and more expensive for tasks that run continuously at high utilization.

Fargate Spot (ECS) provides up to 70% cost reduction in exchange for interruption risk. Spot interruption drains tasks with a two-minute warning — the same model as EC2 Spot but at task granularity. Use it for batch jobs, CI runners, and stateless workers that can be retried. Never use Fargate Spot for stateful tasks that cannot survive abrupt termination.

Fargate Networking and IAM Model

Every Fargate task gets its own elastic network interface (ENI) injected into your VPC. Each task has its own private IP, its own security group, and is subject to VPC flow logs. There is no shared network namespace between tasks — you get genuine multi-tenancy isolation. The trade-off is ENI density: AWS accounts have per-region ENI limits, and a sudden burst of hundreds of Fargate tasks can exhaust that limit silently. Request a quota increase proactively if you plan bursts above a few hundred concurrent tasks per region.

The IAM model splits into two roles: the task execution role (used by the Fargate data plane to pull the image from ECR and write logs to CloudWatch — your application code never uses this role) and the task role (assumed by your application code via the ECS credential vending endpoint at 169.254.170.2). Confusing the two is the most common source of AccessDeniedException errors when containers fail to pull from private ECR repositories. If you see image pull failures, check the execution role first, not the task role.

Fargate launch architecture: ECS/EKS control plane schedules tasks onto Firecracker microVMs; each task gets its own ENI in your VPC and IAM credentials via the internal metadata endpoint.

# ECS Fargate task definition — production pattern
# Register with: aws ecs register-task-definition --cli-input-json file://task-def.json
{
  "family": "api-worker",
  "requiresCompatibilities": ["FARGATE"],
  "networkMode": "awsvpc",
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn":      "arn:aws:iam::123456789012:role/api-worker-task-role",
  "containerDefinitions": [
    {
      "name": "app",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/api-worker:sha-abc1234",
      "portMappings": [{ "containerPort": 8080 }],
      "environment": [
        { "name": "NODE_ENV", "value": "production" }
      ],
      "secrets": [
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/db-password"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group":         "/ecs/api-worker",
          "awslogs-region":        "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "healthCheck": {
        "command":     ["CMD-SHELL", "curl -f http://localhost:8080/healthz || exit 1"],
        "interval":    15,
        "timeout":     5,
        "retries":     3,
        "startPeriod": 30
      },
      "stopTimeout": 30
    },
    {
      "name": "otel-collector",
      "image": "public.ecr.aws/aws-observability/aws-otel-collector:latest",
      "essential": false,
      "command": [
        "--config",
        "/etc/ecs/container-insights/otel-task-metrics-config.yaml"
      ]
    }
  ]
}

Google Cloud Run: Request-Scoped Serverless Containers

Cloud Run takes a different philosophical stance: it is designed from the ground up around HTTP request handling, not generic container execution. A Cloud Run service maps to a URL; when traffic arrives, Cloud Run schedules a container instance behind that URL and routes the request. When traffic drops to zero, the platform scales instances down to zero — a capability that Fargate does not offer by default, though ECS Application Auto Scaling can scale to zero with some latency. Cloud Run v2 introduced a jobs primitive alongside services, giving you batch-style, non-HTTP execution with the same zero-infrastructure model.

The unit of scale is the container instance. Each instance handles multiple concurrent requests (configured via --concurrency, defaulting to 80). The product of concurrency and the number of instances gives your total request throughput. Cloud Run's billing charges only for CPU and memory actually consumed during request processing — idle time between requests on a live instance is not billed (unless you enable --cpu-always-allocated for background work such as Pub/Sub pull subscribers). This makes Cloud Run extremely cost-effective for APIs with uneven traffic that still need sub-100ms cold start targets.

Set --min-instances 1 for latency-sensitive production services. A single warm instance eliminates cold starts entirely for traffic that averages less than 80 RPS, and the always-on cost of one 1-vCPU 512-MiB instance is about $12/month — typically far cheaper than the p99 latency penalty of cold starts hitting your SLO budget.

# Cloud Run production deployment — Terraform (google_cloud_run_v2_service)
resource "google_cloud_run_v2_service" "api" {
  name     = "api-service"
  location = "us-central1"
  ingress  = "INGRESS_TRAFFIC_ALL"

  template {
    scaling {
      min_instance_count = 1
      max_instance_count = 200
    }

    containers {
      image = "us-central1-docker.pkg.dev/my-project/api/server:${var.image_tag}"

      resources {
        limits = {
          cpu    = "2"
          memory = "1Gi"
        }
        cpu_idle          = false   # CPU always allocated — needed for pub/sub consumers
        startup_cpu_boost = true    # 2x CPU during startup to shrink cold-start time
      }

      ports {
        container_port = 8080
      }

      env {
        name  = "NODE_ENV"
        value = "production"
      }

      env {
        name = "DB_PASSWORD"
        value_source {
          secret_key_ref {
            secret  = google_secret_manager_secret.db_password.secret_id
            version = "latest"
          }
        }
      }

      liveness_probe {
        http_get {
          path = "/healthz"
          port = 8080
        }
        initial_delay_seconds = 5
        period_seconds        = 10
      }

      startup_probe {
        http_get {
          path = "/readyz"
          port = 8080
        }
        failure_threshold     = 3
        period_seconds        = 5
      }
    }

    vpc_access {
      connector = google_vpc_access_connector.main.id
      egress    = "PRIVATE_RANGES_ONLY"
    }

    service_account = google_service_account.api_runner.email
    timeout         = "60s"

    # Graceful shutdown: Cloud Run sends SIGTERM, waits max_instance_request_concurrency * timeout
    max_instance_request_concurrency = 80
  }

  traffic {
    type    = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
    percent = 100
  }
}

# IAM: allow Cloud Load Balancer / unauthenticated invocations
resource "google_cloud_run_v2_service_iam_member" "public" {
  project  = google_cloud_run_v2_service.api.project
  location = google_cloud_run_v2_service.api.location
  name     = google_cloud_run_v2_service.api.name
  role     = "roles/run.invoker"
  member   = "allUsers"
}

Cold Start Comparison: Fargate vs Cloud Run

Both platforms have cold start latency, but the sources differ. On Fargate, the dominant cost is image pull: a 500 MB ECR image over a cold ENI can take 15–45 seconds, completely dwarfing Firecracker boot time (~125ms). The fix is aggressive image optimization — multi-stage builds, distroless base images, and Fargate's image pull optimization via lazy loading (Amazon ECR supports Seekable OCI / SOCI index, which begins running the container before all layers are pulled — reducing effective cold start by 50–80% for large images). On Cloud Run, cold start is dominated by application initialization time, not image pull, because Cloud Run caches image layers across instances. A Go binary with a 50 MiB image starts in under 200ms; a Node.js app importing 300 MiB of node_modules can take 2–4 seconds.

Never use :latest tags in production deployments on either platform. Fargate caches the resolved digest at task registration time, so a re-tag does not update running tasks — but it creates a mismatch between what the task definition says and what is running. Cloud Run similarly pins to a digest at deploy time. Always build immutable image tags (git SHA or semantic version) and update the task definition or Cloud Run revision explicitly. This also makes rollbacks deterministic.

Choosing Between Fargate and Cloud Run

This is not primarily a technology question — it is a workload profile question. Choose Fargate when: you are deep in the AWS ecosystem and need native VPC integration, IAM fine-grained access, or you run workloads longer than the Cloud Run 60-minute request timeout; you need EKS compatibility to run Kubernetes-native tooling (Helm, Argo CD, KEDA); or your batch jobs require multi-container task definitions with sidecar patterns. Choose Cloud Run when: you are on GCP or building greenfield services where scale-to-zero economics matter; you want the simplest possible HTTP service deployment path (a single gcloud run deploy); or you are running event-driven consumers off Pub/Sub where Cloud Run's native Eventarc integration removes the need for a pull worker process entirely.

At scale, both platforms support traffic splitting for canary deployments natively: Fargate uses ECS blue/green via CodeDeploy or weighted target groups behind an ALB; Cloud Run has first-class revision traffic splitting in the service spec. Both integrate with your existing observability stack — the OpenTelemetry sidecar pattern on Fargate and the built-in Cloud Trace/Monitoring integration on Cloud Run serve the same purpose. The operationally mature path on either platform is: immutable image tags, health probes with realistic startPeriod or startup probe grace periods, graceful shutdown handling (catch SIGTERM and drain in-flight requests within the platform's termination timeout), secrets from a secrets manager rather than environment variables baked into images, and structured JSON logging that the platform's log aggregation can parse and route to your observability backend.