IAM Roles & Policies in Depth
IAM Roles & Policies in Depth
AWS Identity and Access Management is the authorization backbone of every production system on AWS. Most engineers understand the basics — users, groups, policies — but production-grade security depends on a deeper model: role assumption, trust policies, permission boundaries, and policy conditions. Getting these wrong is how privilege escalation, data exfiltration, and compliance failures happen. This lesson closes that gap.
How Role Assumption Works
An IAM Role is not an identity you log in as — it is a set of permissions that any trusted principal can assume temporarily. When an EC2 instance, a Lambda function, a CI/CD pipeline, or a human assumes a role, AWS STS (Security Token Service) issues short-lived credentials: an AccessKeyId, SecretAccessKey, and a SessionToken that expire (default 1 hour, maximum configurable per role up to 12 hours).
The assumption flow has two gates. Gate 1 is the trust policy — who is allowed to call sts:AssumeRole. Gate 2 is the permission policy attached to the role — what the resulting session can do. Both gates must pass. This dual-gate model is what makes roles fundamentally safer than long-lived access keys.
Trust Policies — The First Gate
A trust policy is a JSON resource-based policy attached to the role itself. It answers: which principals are allowed to call sts:AssumeRole on this role? The Principal element can reference AWS accounts, specific IAM users/roles, AWS services (like ec2.amazonaws.com for instance profiles), or OIDC/SAML federated identities.
sts:ExternalId in the trust policy when granting access to external services. The external ID should be unique per customer and treated as a secret shared between you and the vendor.
Permission Policies — The Second Gate
Permission policies define what the session can do. AWS evaluates them with an explicit-deny-first model: any matching Deny statement in any policy — identity policy, resource policy, SCP, or permission boundary — overrides every Allow. Know the evaluation order: SCPs → Resource-based policies → Identity-based policies → Permission boundaries → Session policies.
Production roles should follow least-privilege religiously. Use Resource ARNs instead of *, scope conditions to specific VPCs or tags, and never grant iam:* or sts:AssumeRole on * to workload roles.
Permission Boundaries — Capping Delegation
A permission boundary is a managed policy you attach to an IAM role (or user) that acts as a ceiling on what that identity can ever do — even if more permissive policies are attached later. The effective permissions are the intersection of the identity's permission policies and the boundary.
The canonical production use case: you want a CI/CD pipeline to be able to create IAM roles for microservices, but you never want those pipeline-created roles to exceed the permissions the pipeline itself has. You enforce this by requiring that any role the pipeline creates must have the same boundary applied.
Condition check on iam:PermissionsBoundary, a pipeline with iam:CreateRole can create a role with no boundary and full AdministratorAccess — a classic privilege escalation path catalogued in AWS security research. Always pair iam:CreateRole with this condition.
Policy Conditions — Precision Control
Conditions are the most underused IAM feature. They let you make permissions context-sensitive: enforce MFA, restrict to specific source IPs or VPCs, require encryption, or gate on resource tags. Condition operators are typed: StringEquals, ArnLike, IpAddress, Bool, NumericLessThan, DateGreaterThan, etc.
Key condition keys for production hardening:
aws:MultiFactorAuthPresent— require MFA for sensitive actionsaws:SourceVpc/aws:SourceVpce— restrict S3 access to traffic from your VPCaws:RequestedRegion— deny actions outside approved regions (often paired with SCPs)aws:PrincipalTag/aws:ResourceTag— attribute-based access control (ABAC)s3:x-amz-server-side-encryption— deny S3 puts without encryptionec2:Region,ec2:InstanceType— cap instance types developers can launch
aws:PrincipalTag and aws:ResourceTag compresses that: one role, many principals, access scoped dynamically by tags like team=payments. Tag every resource consistently and use a Service Control Policy to enforce tagging at creation time.
Operational Patterns
Instance profiles are how EC2 gets a role — you attach a role to an instance profile, and the EC2 metadata service (http://169.254.169.254/latest/meta-data/iam/security-credentials/ or IMDSv2 equivalent) vends rotating credentials automatically. Always enable IMDSv2 (token-required mode) to block SSRF-based credential theft.
Service-linked roles are pre-created by AWS services (e.g., AWSServiceRoleForECS) with trust policies locked to the service. You cannot modify their trust policies, only their permission policies — and many have neither, as the permissions are AWS-managed.
Use aws sts get-caller-identity to verify which identity your current credentials represent. Use aws iam simulate-principal-policy to test policy logic before deploying — critical for debugging "Access Denied" in complex multi-policy environments.
AssumeRole call is logged with the session name in CloudTrail under sts:AssumeRole events. Always set a meaningful --role-session-name (e.g. ci-pipeline-run-12345) so security teams can trace which pipeline run generated which API calls. Never use generic names like session1.