RDS & Managed Databases
RDS & Managed Databases
Amazon RDS (Relational Database Service) offloads the undifferentiated heavy lifting of database operations — OS patching, storage provisioning, replication setup, backups, and failover — so your team can concentrate on schema design and query performance. But "managed" does not mean "magic." To run RDS safely at production scale you need to understand its internals: which engine to pick, how Multi-AZ failover actually works, when read replicas help (and when they do not), and how parameter groups let you tune the engine without SSH access.
Supported Engines
RDS supports six database engines. Each engine on RDS is the genuine upstream open-source or commercial binary — not a fork — so your existing expertise, drivers, and ORMs transfer without modification.
- MySQL — the default choice for web applications. RDS MySQL supports versions 5.7 and 8.0. Choose 8.0 for window functions, CTEs, and better InnoDB defaults.
- PostgreSQL — preferred at big-tech for its strict SQL compliance, JSONB indexing, logical replication, and extensions. RDS supports pg 13–16. Use
pg_stat_statementsandauto_explainvia parameter groups to diagnose slow queries without shell access. - MariaDB — a MySQL-compatible fork. Its thread-pool connection model makes it attractive for high-concurrency workloads that overwhelm MySQL single-thread-per-connection design.
- Oracle and SQL Server — available for enterprise license-portability scenarios (Bring Your Own License or License Included). Avoid for greenfield work; the managed feature set lags behind the open-source engines.
- Amazon Aurora — AWS cloud-native engine with MySQL- and PostgreSQL-compatible endpoints. It replaces the on-disk storage layer with a distributed, self-healing 6-copy storage volume spread across three AZs. Aurora lives under the RDS console but is a distinct product. At scale, Aurora storage I/O eliminates replication lag for reads and reduces recovery time from minutes to under 30 seconds.
TimescaleDB.
Multi-AZ vs. Read Replicas
These two features are frequently confused because both involve a second database instance. They solve entirely different problems.
Multi-AZ is a high-availability mechanism. AWS provisions a synchronous standby replica in a different Availability Zone. Every write to the primary is synchronously committed to the standby before the write returns success to the application. If the primary fails — hardware fault, AZ outage, OS crash — RDS updates the DNS CNAME for your endpoint to point to the standby within 60–120 seconds. Your application reconnects to the same hostname and resumes. The standby is invisible to your application during normal operation; it is never used for reads. You pay for it in full to have it idle and ready. This is the correct trade-off for production: reliability over cost.
Read Replicas are a read-scaling mechanism. RDS uses asynchronous replication (binary log for MySQL/MariaDB, streaming replication for PostgreSQL) to ship writes from the primary to one or more replicas. Each replica gets its own DNS endpoint and can serve SELECT queries, offloading the primary. Replication is asynchronous, so replicas can lag behind the primary by milliseconds to seconds under heavy write load. Reads from a replica are eventually consistent, not strongly consistent. Read replicas are not automatic failover targets by default — you must manually promote one and update your connection strings. You can create up to 5 read replicas per RDS instance (15 for Aurora).
Automated Backups and Snapshots
RDS provides two distinct backup mechanisms. Understanding the difference is critical for defining your RTO and RPO.
Automated Backups are enabled by default (retention 1–35 days). RDS takes a daily storage-volume snapshot during the configured backup window and continuously archives transaction logs to S3. This combination enables point-in-time recovery (PITR): you can restore your database to any second within the retention window. Restoration creates a new DB instance; it does not restore in place.
Manual Snapshots are created explicitly via console, CLI, or automation. They persist until you delete them, surviving even instance deletion. Use them before schema migrations, major version upgrades, or any destructive operation. They do not enable PITR — they are a full volume snapshot at a single point in time.
Parameter Groups: Engine Tuning Without Shell Access
A parameter group is a named collection of engine configuration variables (the equivalent of my.cnf or postgresql.conf) that you attach to an RDS instance or cluster. Because you have no OS access on RDS, parameter groups are the only way to tune the database engine.
Parameters fall into two categories. Static parameters require an instance reboot to take effect (for example, max_connections, shared_buffers). Dynamic parameters apply immediately without a reboot (for example, log_min_duration_statement, work_mem). RDS ships a default parameter group for every engine version; the default group is immutable — you must create a custom group to change anything.
Key parameters worth tuning for PostgreSQL production workloads:
shared_buffers— RDS formula{DBInstanceClassMemory/4}sets this to 25% of RAM, the standard starting point. Static; requires reboot.work_mem— memory per sort/hash operation. Default 4 MB is usually too low for analytical queries. Raise cautiously: this multiplies per query per connection. Dynamic.log_min_duration_statement— log any query exceeding N milliseconds. Set to 1000 initially, lower to 200 once you have solved the obvious slow queries. Dynamic.random_page_cost— set to 1.1 on SSD-backed gp3/io1 storage (the default 4.0 was designed for spinning disks and causes the planner to over-prefer sequential scans on SSDs). Dynamic.max_connections— static. On large instances, use PgBouncer or RDS Proxy in front of RDS instead of raising this value; connection overhead is significant in PostgreSQL.
SHOW shared_buffers; after the reboot — do not assume the change applied.
Storage: gp3 vs. io1 vs. io2
RDS storage performance directly affects database latency. Choose based on your IOPS and throughput requirements, not instance size alone.
- gp3 — general-purpose SSD. 3,000 IOPS and 125 MB/s baseline, upgradeable to 16,000 IOPS and 1,000 MB/s independently of storage size. The right default for most production workloads.
- io2 — provisioned IOPS SSD. Up to 256,000 IOPS with 99.999% durability SLA. Use for demanding OLTP workloads where latency must be sub-millisecond under sustained load.
- Aurora — storage is managed separately; you do not provision IOPS. Aurora's distributed storage auto-scales to 128 TiB and handles its own I/O. This is one of Aurora's strongest advantages over RDS.
aws rds describe-db-instances --query 'DBInstances[*].{ID:DBInstanceIdentifier,Storage:StorageType}' to audit your fleet.
Provisioning with the CLI: End-to-End Example
The following creates a production-grade Multi-AZ PostgreSQL instance with a custom parameter group and automated backups already configured.
Important flags to note: --storage-encrypted enables AES-256 encryption at rest (mandatory for compliance); --deletion-protection prevents accidental instance deletion via console or CLI; --no-publicly-accessible ensures the instance has no public IP and is reachable only within your VPC; --enable-performance-insights gives you a free 7-day query performance dashboard that surfaces wait events and top SQL.