Cloud Fundamentals: AWS Core Services

The AWS CLI & APIs

18 min Lesson 9 of 30

The AWS CLI & APIs

Every action you take in the AWS Console — launching an EC2 instance, creating an S3 bucket, attaching an IAM policy — is nothing more than an authenticated HTTPS call to an AWS API endpoint. The AWS CLI is a thin wrapper around those same APIs. Understanding this equivalence is the mental shift that turns you from a console operator into an automation engineer who can script, test, reproduce, and version-control every infrastructure change.

At big-tech scale, the CLI is not a convenience shortcut — it is the foundation of every deployment pipeline, runbook, and incident-response script. This lesson covers everything you need to use it professionally: how credentials are resolved, how to configure named profiles for multiple accounts and roles, the key CLI patterns every SRE uses daily, and how to compose powerful jq pipelines to extract exactly the signal you need from JSON API responses.

How the CLI Resolves Credentials — The Credential Chain

The single most important concept to understand about the AWS CLI is its credential provider chain. When you run any aws command, the SDK works through a fixed priority list and uses the first credential source it finds:

  1. CLI flags: --profile, --region, environment-variable overrides passed inline.
  2. Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_PROFILE. These override every file-based source — useful in CI but dangerous if you forget them set in a terminal session.
  3. ~/.aws/credentials and ~/.aws/config: Named profiles. The [default] profile is used when no profile is specified.
  4. AWS SSO / Identity Center: Short-lived token obtained via aws sso login.
  5. ECS task role / EC2 instance profile: On AWS-managed compute, credentials are injected automatically via the Instance Metadata Service or the ECS credential provider endpoint. This is how production workloads should authenticate — never bake long-lived keys into EC2 or containers.
  6. AWS config file role assumption: A profile with role_arn + source_profile causes the CLI to call sts:AssumeRole automatically and cache the resulting session credentials.
Never put long-lived IAM access keys in environment variables in production. Leaked AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY in a Docker image, a CI log, or a shipped binary is one of the most common root causes of cloud account takeovers. In production, always use instance profiles, task roles, or short-lived SSO sessions. Rotate any long-lived key that exists — treat them like passwords.

Configuring Named Profiles

Professional AWS users manage multiple accounts — development, staging, production, plus separate accounts per service team in large organisations. Named profiles in ~/.aws/config let you switch contexts cleanly without re-exporting environment variables.

# ~/.aws/config — the primary config file (credentials go in ~/.aws/credentials) [default] region = us-east-1 output = json # Static credentials profile (for personal dev AWS account) [profile dev] region = us-west-2 output = json # Role-assumption profile — the CLI calls sts:AssumeRole automatically [profile prod-readonly] role_arn = arn:aws:iam::123456789012:role/ReadOnlyOpsRole source_profile = dev region = us-east-1 role_session_name = edrees-ops-session duration_seconds = 3600 # SSO profile (Identity Center — preferred for human access at big-tech orgs) [profile acme-prod] sso_start_url = https://acme.awsapps.com/start sso_account_id = 987654321098 sso_role_name = SREReadOnly sso_region = us-east-1 region = us-east-1 output = json

Once profiles are defined, specify them per-command with --profile or set AWS_PROFILE for the lifetime of a shell session:

# Explicit profile flag — safest for scripts, always unambiguous aws s3 ls --profile prod-readonly # Login to SSO and cache a short-lived token (expires in 8 hours by default) aws sso login --profile acme-prod # Verify which identity you are acting as — always run this first in a new shell aws sts get-caller-identity --profile acme-prod # Output: # { # "UserId": "AROA...:edrees-ops-session", # "Account": "987654321098", # "Arn": "arn:aws:sts::987654321098:assumed-role/SREReadOnly/edrees-ops-session" # } # Switch environment for the rest of the session export AWS_PROFILE=prod-readonly export AWS_DEFAULT_REGION=us-east-1 # Check the current effective configuration aws configure list
Always run aws sts get-caller-identity before any destructive operation — terminate-instances, delete-stack, delete-bucket. It takes 200ms and confirms you are authenticated as the role you intended, in the account you intended. Teams that skip this step accidentally delete production resources from a wrong profile.

Common CLI Patterns Every SRE Uses

The --query flag (JMESPath expression) and --output flag (json, text, table, yaml) are your primary tools for shaping API responses without external tools. Combine them with --filters to push filtering server-side and avoid transferring large response payloads.

AWS CLI Request and Response Flow Terminal aws ec2 describe --query --filter AWS CLI Credential chain Sign request (SigV4) HTTPS AWS API Authn: SigV4 Authz: IAM policy Server-side --filter JSON response ec2.amazonaws.com jq / --query Client-side transform table / text / yaml You type You read
AWS CLI request flow: credentials are resolved locally, the request is signed with SigV4, sent to the AWS API, and the JSON response is shaped by --query or piped through jq.
# --- EC2 patterns --- # List running instances: ID, type, AZ, private IP, Name tag aws ec2 describe-instances \ --filters "Name=instance-state-name,Values=running" \ --query 'Reservations[*].Instances[*].{ ID:InstanceId, Type:InstanceType, AZ:Placement.AvailabilityZone, IP:PrivateIpAddress, Name:Tags[?Key==`Name`]|[0].Value }' \ --output table # Stop a specific instance (protect production: require --profile flag always) aws ec2 stop-instances \ --instance-ids i-0abc123def456 \ --profile dev # Terminate all stopped instances in a region (DRY RUN first -- important!) aws ec2 describe-instances \ --filters "Name=instance-state-name,Values=stopped" \ --query 'Reservations[*].Instances[*].InstanceId' \ --output text | xargs aws ec2 terminate-instances --dry-run --instance-ids # --- S3 patterns --- # Sync a local directory to S3 (upload only changed files) aws s3 sync ./dist s3://my-frontend-bucket/ \ --delete \ --cache-control "max-age=31536000" \ --exclude "*.html" \ --include "*.js" --include "*.css" # Force-delete all objects in a versioned bucket before deleting the bucket aws s3api list-object-versions \ --bucket my-old-bucket \ --query 'Versions[*].{Key:Key,VersionId:VersionId}' \ --output text | awk '{print "--key "$1" --version-id "$2}' | \ xargs -L1 aws s3api delete-object --bucket my-old-bucket # --- CloudFormation patterns --- # Deploy a stack (create or update) — idempotent aws cloudformation deploy \ --template-file infra/vpc.yaml \ --stack-name acme-vpc \ --capabilities CAPABILITY_IAM \ --parameter-overrides Env=prod CIDR=10.0.0.0/16 \ --profile prod-readonly # Tail stack events during deployment aws cloudformation describe-stack-events \ --stack-name acme-vpc \ --query 'StackEvents[?ResourceStatus!=`UPDATE_COMPLETE`].[Timestamp,ResourceStatus,ResourceStatusReason]' \ --output text | head -20

jq Pipelines for Production Queries

jq is a command-line JSON processor that every SRE should be fluent in. While the CLI's built-in --query (JMESPath) handles simple projections, jq gives you Turing-complete transformations: conditionals, arithmetic, grouping, custom keys, and cross-resource joins. Pipe any aws ... --output json call into jq and you have a full query engine over your infrastructure state.

# Find all EC2 instances NOT tagged with "Environment" — common compliance audit aws ec2 describe-instances --output json | \ jq ' [.Reservations[].Instances[] | select(.State.Name == "running") | select(.Tags == null or ((.Tags | map(.Key) | index("Environment")) == null)) | {id: .InstanceId, type: .InstanceType, az: .Placement.AvailabilityZone} ] ' # Cost analysis helper: count running instances per type aws ec2 describe-instances \ --filters "Name=instance-state-name,Values=running" \ --output json | \ jq ' [.Reservations[].Instances[].InstanceType] | group_by(.) | map({type: .[0], count: length}) | sort_by(.count) | reverse ' # Find S3 buckets with public ACLs (quick security sweep) aws s3api list-buckets --output json | \ jq -r '.Buckets[].Name' | \ while read bucket; do acl=$(aws s3api get-bucket-acl --bucket "$bucket" --output json 2>/dev/null | \ jq -r '.Grants[].Grantee | select(.URI? | contains("AllUsers")) | "PUBLIC"' 2>/dev/null) [ -n "$acl" ] && echo "$bucket: $acl" done # Extract ELB target health — which targets are unhealthy? aws elbv2 describe-target-health \ --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/abc123 \ --output json | \ jq ' [.TargetHealthDescriptions[] | select(.TargetHealth.State != "healthy") | {id: .Target.Id, port: .Target.Port, state: .TargetHealth.State, reason: .TargetHealth.Reason} ] ' # Chain: find the SG of every running instance, then list its rules INSTANCE_ID="i-0abc123def456" SG_ID=$(aws ec2 describe-instances \ --instance-ids $INSTANCE_ID \ --output json | \ jq -r '.Reservations[0].Instances[0].SecurityGroups[0].GroupId') aws ec2 describe-security-groups \ --group-ids $SG_ID \ --output json | \ jq ' .SecurityGroups[0].IpPermissions[] | {proto: .IpProtocol, from: .FromPort, to: .ToPort, cidrs: [.IpRanges[].CidrIp]} '
Use --dry-run before destructive operations. Many EC2 and IAM write API calls accept a --dry-run flag. It sends the full authenticated request to AWS, performs all IAM authorization checks, but does not execute the mutation. You get an DryRunOperation error (which means "you have permission and this would have worked") or a real auth error (which means "fix your IAM before running for real"). Make dry-run part of your runbook standard for any operation touching production.

AWS APIs Beyond the CLI — SDKs and Direct HTTP

The CLI is built on the boto3 Python SDK (or the Go SDK for newer CLI v2 internals). Every SDK call maps 1:1 to a CLI command: aws ec2 describe-instances is exactly boto3.client('ec2').describe_instances(). Understanding this lets you prototype in the CLI and then promote to SDK code in your Lambda functions, Terraform providers, and internal tooling without re-learning the API shape.

Every AWS API request is authenticated using Signature Version 4 (SigV4) — an HMAC-SHA256 signing algorithm that covers the request method, URI, headers, and body, preventing tampering in transit. You will never implement SigV4 manually (every SDK does it), but knowing it exists explains why rotating a compromised key immediately invalidates all in-flight requests signed with the old key.

Paginate everything in scripts. Most AWS list APIs return a maximum of 100–1,000 results per call and a NextToken for subsequent pages. The CLI --no-paginate flag fetches all pages automatically (at the cost of memory), while the AWS SDK has dedicated paginator objects. A script that only reads the first page of describe-instances or list-buckets silently misses resources in large accounts — a subtle bug that has caused genuine production incidents when compliance scripts miss untagged or misconfigured resources.

Key Takeaways

  • The credential provider chain resolves credentials in strict priority order: flags, environment variables, config files, SSO, instance profile. In production, always use instance profiles or task roles — never long-lived keys.
  • Named profiles in ~/.aws/config with role_arn + source_profile enable automatic role assumption — essential for multi-account environments.
  • Run aws sts get-caller-identity before any destructive operation to confirm the active identity and account.
  • Use --filters for server-side filtering (reduces API response size) and --query (JMESPath) for client-side projection.
  • jq unlocks cross-resource queries, conditional logic, grouping, and audit pipelines that JMESPath cannot express.
  • All CLI calls are authenticated HTTP requests signed with SigV4. The CLI is thin sugar over the same public APIs used by the Console, SDKs, and Terraform.
  • Always paginate in scripts — missing the second page of results is a real, silent failure mode in large-scale accounts.