REST API Development

API Monitoring & Logging

15 min Lesson 30 of 35

Understanding API Observability

API monitoring and logging are essential for maintaining system health, debugging issues, understanding user behavior, and ensuring SLAs are met. Observability encompasses logs, metrics, and traces that provide visibility into your API's behavior in production environments.

The Three Pillars of Observability:
  • Logs: Detailed records of discrete events (requests, errors, warnings)
  • Metrics: Numerical measurements over time (response times, request rates, error rates)
  • Traces: End-to-end request flow across distributed services

Request Logging

Comprehensive request logging helps you understand API usage patterns, identify errors, and debug production issues.

Structured Logging with Winston

// winston-logger.js const winston = require('winston'); // Create logger instance const logger = winston.createLogger({ level: process.env.LOG_LEVEL || 'info', format: winston.format.combine( winston.format.timestamp({ format: 'YYYY-MM-DD HH:mm:ss' }), winston.format.errors({ stack: true }), winston.format.splat(), winston.format.json() ), defaultMeta: { service: 'api-service' }, transports: [ // Write errors to error.log new winston.transports.File({ filename: 'logs/error.log', level: 'error', maxsize: 5242880, // 5MB maxFiles: 5 }), // Write all logs to combined.log new winston.transports.File({ filename: 'logs/combined.log', maxsize: 5242880, maxFiles: 5 }) ] }); // Add console output in development if (process.env.NODE_ENV !== 'production') { logger.add(new winston.transports.Console({ format: winston.format.combine( winston.format.colorize(), winston.format.simple() ) })); } module.exports = logger;

Request Logging Middleware

const express = require('express'); const logger = require('./winston-logger'); const { v4: uuidv4 } = require('uuid'); const app = express(); // Request logging middleware app.use((req, res, next) => { // Generate unique request ID req.id = uuidv4(); req.startTime = Date.now(); // Log incoming request logger.info('Incoming request', { requestId: req.id, method: req.method, path: req.path, query: req.query, ip: req.ip, userAgent: req.get('user-agent') }); // Log response const originalSend = res.send; res.send = function(data) { const duration = Date.now() - req.startTime; logger.info('Request completed', { requestId: req.id, method: req.method, path: req.path, statusCode: res.statusCode, duration: `${duration}ms`, contentLength: res.get('content-length') }); // Log slow requests if (duration > 1000) { logger.warn('Slow request detected', { requestId: req.id, method: req.method, path: req.path, duration: `${duration}ms` }); } return originalSend.call(this, data); }; next(); }); // Error logging middleware app.use((err, req, res, next) => { logger.error('Request error', { requestId: req.id, method: req.method, path: req.path, error: { message: err.message, stack: err.stack, code: err.code }, statusCode: res.statusCode }); res.status(err.statusCode || 500).json({ error: { message: err.message, requestId: req.id } }); });
Structured Logging Benefits: JSON-formatted logs are easily parseable by log aggregation tools (ELK Stack, Splunk, Datadog). Include request IDs to trace requests across services. Always log timestamps for time-based analysis.

Advanced Logging with Morgan

const morgan = require('morgan'); const logger = require('./winston-logger'); // Custom token for request ID morgan.token('id', (req) => req.id); // Custom token for response time in milliseconds morgan.token('response-time-ms', (req, res) => { const duration = Date.now() - req.startTime; return `${duration}ms`; }); // Create stream to send Morgan logs to Winston const stream = { write: (message) => logger.http(message.trim()) }; // Morgan format with custom tokens const format = ':id :method :url :status :response-time-ms - :res[content-length]'; // Use Morgan middleware app.use(morgan(format, { stream })); // Custom Morgan format for detailed logging const detailedFormat = (tokens, req, res) => { return JSON.stringify({ requestId: tokens.id(req, res), method: tokens.method(req, res), url: tokens.url(req, res), status: tokens.status(req, res), responseTime: tokens['response-time'](req, res), contentLength: tokens.res(req, res, 'content-length'), userAgent: tokens['user-agent'](req, res), ip: tokens['remote-addr'](req, res) }); }; app.use(morgan(detailedFormat, { stream }));

Performance Metrics

Collect and monitor key performance metrics to identify bottlenecks and track system health over time.

Prometheus Metrics Integration

const express = require('express'); const prometheus = require('prom-client'); const app = express(); // Create a Registry to register metrics const register = new prometheus.Registry(); // Enable default metrics (CPU, memory, event loop lag) prometheus.collectDefaultMetrics({ register }); // Custom metrics const httpRequestDuration = new prometheus.Histogram({ name: 'http_request_duration_seconds', help: 'Duration of HTTP requests in seconds', labelNames: ['method', 'route', 'status_code'], buckets: [0.1, 0.5, 1, 2, 5, 10] // Seconds }); const httpRequestTotal = new prometheus.Counter({ name: 'http_requests_total', help: 'Total number of HTTP requests', labelNames: ['method', 'route', 'status_code'] }); const activeConnections = new prometheus.Gauge({ name: 'http_active_connections', help: 'Number of active HTTP connections' }); const databaseQueryDuration = new prometheus.Histogram({ name: 'database_query_duration_seconds', help: 'Duration of database queries in seconds', labelNames: ['query_type', 'table'], buckets: [0.01, 0.05, 0.1, 0.5, 1, 2] }); // Register custom metrics register.registerMetric(httpRequestDuration); register.registerMetric(httpRequestTotal); register.registerMetric(activeConnections); register.registerMetric(databaseQueryDuration); // Metrics middleware app.use((req, res, next) => { activeConnections.inc(); const start = Date.now(); res.on('finish', () => { const duration = (Date.now() - start) / 1000; // Convert to seconds const route = req.route ? req.route.path : req.path; httpRequestDuration .labels(req.method, route, res.statusCode) .observe(duration); httpRequestTotal .labels(req.method, route, res.statusCode) .inc(); activeConnections.dec(); }); next(); }); // Expose metrics endpoint for Prometheus app.get('/metrics', async (req, res) => { res.set('Content-Type', register.contentType); res.end(await register.metrics()); }); // Example: Track database query metrics async function executeQuery(queryType, table, query, params) { const start = Date.now(); try { const result = await db.query(query, params); const duration = (Date.now() - start) / 1000; databaseQueryDuration .labels(queryType, table) .observe(duration); return result; } catch (error) { const duration = (Date.now() - start) / 1000; databaseQueryDuration .labels(queryType, table) .observe(duration); throw error; } } // Usage app.get('/api/posts', async (req, res) => { const posts = await executeQuery('SELECT', 'posts', 'SELECT * FROM posts LIMIT 10'); res.json(posts); });
Key Metrics to Track:
  • Request rate: Requests per second (RPS)
  • Error rate: Percentage of failed requests (4xx, 5xx)
  • Response time: P50, P95, P99 percentiles
  • Throughput: Data transferred per second
  • Active connections: Number of concurrent connections
  • Database performance: Query duration, connection pool usage

Health Checks

Implement health check endpoints to monitor API availability and dependencies.

Basic Health Check

// Simple health check app.get('/health', (req, res) => { res.status(200).json({ status: 'ok', timestamp: new Date().toISOString(), uptime: process.uptime() }); });

Comprehensive Health Check

const Redis = require('ioredis'); const mysql = require('mysql2/promise'); const redis = new Redis(); const dbPool = mysql.createPool({ /* config */ }); // Detailed health check with dependency checks app.get('/health', async (req, res) => { const health = { status: 'ok', timestamp: new Date().toISOString(), uptime: process.uptime(), checks: {} }; // Check database try { await dbPool.query('SELECT 1'); health.checks.database = { status: 'healthy', latency: '<50ms' }; } catch (error) { health.status = 'degraded'; health.checks.database = { status: 'unhealthy', error: error.message }; } // Check Redis try { const start = Date.now(); await redis.ping(); const latency = Date.now() - start; health.checks.redis = { status: 'healthy', latency: `${latency}ms` }; } catch (error) { health.status = 'degraded'; health.checks.redis = { status: 'unhealthy', error: error.message }; } // Check memory usage const memUsage = process.memoryUsage(); health.checks.memory = { rss: `${Math.round(memUsage.rss / 1024 / 1024)}MB`, heapUsed: `${Math.round(memUsage.heapUsed / 1024 / 1024)}MB`, heapTotal: `${Math.round(memUsage.heapTotal / 1024 / 1024)}MB` }; // Return 503 if unhealthy const statusCode = health.status === 'ok' ? 200 : 503; res.status(statusCode).json(health); }); // Readiness check (can service accept traffic?) app.get('/ready', async (req, res) => { try { // Check critical dependencies await dbPool.query('SELECT 1'); await redis.ping(); res.status(200).json({ ready: true }); } catch (error) { res.status(503).json({ ready: false, error: error.message }); } }); // Liveness check (is service running?) app.get('/live', (req, res) => { res.status(200).json({ alive: true }); });
Health Check Best Practices: Implement separate /health (detailed status), /ready (can accept traffic), and /live (is process alive) endpoints. Health checks should be lightweight (<100ms). Don't expose sensitive information in health check responses.

Error Tracking and Alerting

Integrate error tracking services to capture, aggregate, and alert on production errors.

Sentry Integration

const Sentry = require('@sentry/node'); const express = require('express'); const app = express(); // Initialize Sentry Sentry.init({ dsn: process.env.SENTRY_DSN, environment: process.env.NODE_ENV, tracesSampleRate: 0.1, // Sample 10% of transactions integrations: [ // Enable HTTP tracking new Sentry.Integrations.Http({ tracing: true }), // Enable Express.js middleware tracking new Sentry.Integrations.Express({ app }) ] }); // Request handler must be first middleware app.use(Sentry.Handlers.requestHandler()); // Tracing handler for performance monitoring app.use(Sentry.Handlers.tracingHandler()); // Your routes app.get('/api/posts', async (req, res) => { try { const posts = await fetchPosts(); res.json(posts); } catch (error) { // Manually capture error with context Sentry.captureException(error, { tags: { endpoint: '/api/posts', method: 'GET' }, extra: { userId: req.user?.id, query: req.query } }); res.status(500).json({ error: 'Failed to fetch posts' }); } }); // Error handler must be after routes app.use(Sentry.Handlers.errorHandler()); // Custom error handler app.use((err, req, res, next) => { res.status(err.statusCode || 500).json({ error: err.message, requestId: req.id }); });

Custom Alerting System

const nodemailer = require('nodemailer'); const logger = require('./winston-logger'); class AlertingService { constructor() { this.errorCounts = new Map(); this.alertThreshold = 10; // Alert after 10 errors in 5 minutes this.timeWindow = 5 * 60 * 1000; // 5 minutes this.transporter = nodemailer.createTransport({ host: process.env.SMTP_HOST, port: process.env.SMTP_PORT, auth: { user: process.env.SMTP_USER, pass: process.env.SMTP_PASS } }); } async trackError(errorType, error, context = {}) { const key = `${errorType}:${error.message}`; const now = Date.now(); if (!this.errorCounts.has(key)) { this.errorCounts.set(key, []); } const timestamps = this.errorCounts.get(key); timestamps.push(now); // Remove old timestamps const cutoff = now - this.timeWindow; const recentTimestamps = timestamps.filter(ts => ts > cutoff); this.errorCounts.set(key, recentTimestamps); // Check if threshold exceeded if (recentTimestamps.length >= this.alertThreshold) { await this.sendAlert(errorType, error, recentTimestamps.length, context); this.errorCounts.delete(key); // Reset after alert } } async sendAlert(errorType, error, count, context) { const subject = `[ALERT] ${errorType}: ${error.message}`; const body = ` Error Type: ${errorType} Error Message: ${error.message} Occurrences: ${count} in last 5 minutes Stack Trace: ${error.stack} Context: ${JSON.stringify(context, null, 2)} Time: ${new Date().toISOString()} `; try { await this.transporter.sendMail({ from: process.env.ALERT_FROM_EMAIL, to: process.env.ALERT_TO_EMAIL, subject, text: body }); logger.info('Alert sent', { errorType, count }); } catch (err) { logger.error('Failed to send alert', { error: err.message }); } } // Clean up old error counts periodically cleanup() { const now = Date.now(); const cutoff = now - this.timeWindow; for (const [key, timestamps] of this.errorCounts.entries()) { const recentTimestamps = timestamps.filter(ts => ts > cutoff); if (recentTimestamps.length === 0) { this.errorCounts.delete(key); } else { this.errorCounts.set(key, recentTimestamps); } } } } const alerting = new AlertingService(); // Cleanup every minute setInterval(() => alerting.cleanup(), 60000); // Usage in error handling app.use(async (err, req, res, next) => { logger.error('API Error', { error: err.message, stack: err.stack, path: req.path }); // Track error for alerting await alerting.trackError('API_ERROR', err, { method: req.method, path: req.path, statusCode: res.statusCode }); res.status(err.statusCode || 500).json({ error: err.message }); });

Application Performance Monitoring (APM)

APM tools provide deep insights into application performance, database queries, external API calls, and distributed tracing.

New Relic Integration

// newrelic.js (must be first file required) 'use strict'; exports.config = { app_name: ['My API'], license_key: process.env.NEW_RELIC_LICENSE_KEY, logging: { level: 'info' }, transaction_tracer: { enabled: true, transaction_threshold: 'apdex_f', record_sql: 'obfuscated' }, error_collector: { enabled: true, ignore_status_codes: [404] }, distributed_tracing: { enabled: true } };
// server.js require('newrelic'); // Must be first! const express = require('express'); const newrelic = require('newrelic'); const app = express(); // Custom metrics app.get('/api/posts', async (req, res) => { // Record custom metric newrelic.recordMetric('Custom/Posts/Fetched', 1); // Create custom transaction const transaction = newrelic.getTransaction(); transaction.acceptDistributedTraceHeaders('HTTP', req.headers); try { const posts = await fetchPosts(); // Add custom attributes newrelic.addCustomAttributes({ userId: req.user?.id, postCount: posts.length, cached: posts.cached || false }); res.json(posts); } catch (error) { // Notice error (recorded automatically, but can add context) newrelic.noticeError(error, { userId: req.user?.id, endpoint: '/api/posts' }); res.status(500).json({ error: 'Failed to fetch posts' }); } }); // Background job monitoring async function processJob(job) { // Create background transaction return newrelic.startBackgroundTransaction( 'processEmailJob', async () => { const transaction = newrelic.getTransaction(); transaction.acceptDistributedTraceHeaders('Other', job.headers); try { await sendEmail(job.data); newrelic.recordMetric('Custom/Email/Sent', 1); } catch (error) { newrelic.noticeError(error); throw error; } } ); }
Popular APM Tools:
  • New Relic: Comprehensive APM with distributed tracing
  • Datadog: Full-stack observability platform
  • Elastic APM: Open-source APM with ELK Stack integration
  • AppDynamics: Enterprise-grade APM with business metrics

Log Aggregation and Analysis

Centralize logs from multiple services for efficient searching, analysis, and troubleshooting.

ELK Stack (Elasticsearch, Logstash, Kibana) Integration

// winston-elasticsearch.js const winston = require('winston'); const { ElasticsearchTransport } = require('winston-elasticsearch'); const esTransportOpts = { level: 'info', clientOpts: { node: process.env.ELASTICSEARCH_URL || 'http://localhost:9200', auth: { username: process.env.ES_USERNAME, password: process.env.ES_PASSWORD } }, index: 'api-logs', indexPrefix: 'api', indexSuffixPattern: 'YYYY-MM-DD' // Daily indices }; const logger = winston.createLogger({ level: 'info', format: winston.format.combine( winston.format.timestamp(), winston.format.json() ), transports: [ new ElasticsearchTransport(esTransportOpts), new winston.transports.Console() ] }); module.exports = logger;

Structured Logging for Better Analysis

const logger = require('./winston-elasticsearch'); // Log with consistent structure app.get('/api/posts/:id', async (req, res) => { const context = { requestId: req.id, userId: req.user?.id, postId: req.params.id, ip: req.ip, userAgent: req.get('user-agent') }; logger.info('Fetching post', context); try { const post = await fetchPost(req.params.id); logger.info('Post fetched successfully', { ...context, cached: post.cached || false, responseTime: Date.now() - req.startTime }); res.json(post); } catch (error) { logger.error('Failed to fetch post', { ...context, error: { message: error.message, stack: error.stack, code: error.code } }); res.status(500).json({ error: 'Failed to fetch post' }); } });
Log Levels Best Practices:
  • ERROR: Application errors that need immediate attention
  • WARN: Warning conditions that might lead to errors
  • INFO: Important business events (user login, order placed)
  • DEBUG: Detailed diagnostic information for debugging
  • TRACE: Very detailed information (usually disabled in production)

Real-time Monitoring Dashboard

// Real-time metrics with Socket.IO const socketIO = require('socket.io'); const io = socketIO(server); // Metrics collection const metrics = { requestsPerSecond: 0, activeUsers: 0, averageResponseTime: 0, errorRate: 0 }; let requestCount = 0; let responseTimes = []; // Update metrics setInterval(() => { metrics.requestsPerSecond = requestCount; metrics.averageResponseTime = responseTimes.length > 0 ? responseTimes.reduce((a, b) => a + b, 0) / responseTimes.length : 0; // Broadcast to all connected clients io.emit('metrics', metrics); // Reset counters requestCount = 0; responseTimes = []; }, 1000); // Every second // Track requests app.use((req, res, next) => { requestCount++; const start = Date.now(); res.on('finish', () => { responseTimes.push(Date.now() - start); }); next(); });
Exercise: Implement a complete monitoring solution for a REST API with the following requirements: Part 1: Logging
  1. Set up Winston logger with JSON format and daily rotating file transport
  2. Create request logging middleware that logs:
    • Request ID (UUID)
    • Method, path, query parameters
    • Response status code and duration
    • IP address and user agent
  3. Log slow requests (>1 second) as warnings
  4. Log errors with full stack traces
Part 2: Metrics
  1. Integrate Prometheus client
  2. Track the following metrics:
    • HTTP request duration (histogram with P50, P95, P99)
    • Total HTTP requests (counter by method, route, status)
    • Active connections (gauge)
    • Database query duration (histogram by query type)
  3. Expose /metrics endpoint for Prometheus scraping
Part 3: Health Checks
  1. Implement /health endpoint with database and Redis checks
  2. Implement /ready endpoint (for Kubernetes readiness probes)
  3. Implement /live endpoint (for Kubernetes liveness probes)
  4. Return proper status codes (200 healthy, 503 unhealthy)
Part 4: Alerting
  1. Create alerting service that tracks error rates
  2. Send email alert when error rate exceeds 10 errors in 5 minutes
  3. Include error details, stack trace, and context in alert
Bonus: Create a simple dashboard that displays real-time metrics (requests/second, average response time, error rate) using Socket.IO.