Monitor & Debug Lambda: CloudWatch & X-Ray

Observability—the ability to understand what's happening inside running systems—is critical for production Lambda functions. CloudWatch Logs capture function output and errors; CloudWatch Metrics track invocation counts, duration, and errors; X-Ray provides distributed tracing for requests across services. Together, they enable you to diagnose issues, optimize performance, and ensure reliability.

CloudWatch Logs Basics

Lambda automatically sends all print statements and exceptions to CloudWatch Logs. Access logs in the Lambda Console:

Select a function
Click Monitor → Logs
Click a log stream to view invocation details

Each invocation creates a log stream with a timestamp and request ID. Example log output:

START RequestId: abc-123-def-456 Version: $LATEST
2026-06-02T10:30:00.123Z    abc-123-def-456    Processing event: {'userId': 42, 'action': 'login'}
2026-06-02T10:30:00.456Z    abc-123-def-456    Database query took 0.213s
2026-06-02T10:30:00.789Z    abc-123-def-456    END RequestId: abc-123-def-456
REPORT RequestId: abc-123-def-456    Duration: 666.14 ms    Billed Duration: 667 ms    Memory Used: 85 MB    Init Duration: 45.23 ms    XRAY TraceId: 1-abc-def

The REPORT line shows:

Duration: How long the function executed
Billed Duration: Rounded to nearest 1 ms (minimum 1,000 ms)
Memory Used: Actual memory consumption
Init Duration: Cold-start initialization time (only on first invocation)
XRAY TraceId: Link to X-Ray trace (if enabled)

Structured Logging for Production

Use structured logging (JSON format) for easier parsing and querying:

import json
import logging
from datetime import datetime

# Configure JSON logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    request_id = context.aws_request_id
    
    # Log structured event
    logger.info(json.dumps({
        'timestamp': datetime.utcnow().isoformat(),
        'requestId': request_id,
        'event': event,
        'source': 'lambda_handler'
    }))
    
    try:
        user_id = event['userId']
        action = event['action']
        
        # Simulate work
        result = process_action(user_id, action)
        
        # Log success
        logger.info(json.dumps({
            'timestamp': datetime.utcnow().isoformat(),
            'requestId': request_id,
            'status': 'success',
            'result': result
        }))
        
        return {'statusCode': 200, 'body': json.dumps(result)}
    
    except Exception as e:
        # Log error with exception details
        logger.error(json.dumps({
            'timestamp': datetime.utcnow().isoformat(),
            'requestId': request_id,
            'error': str(e),
            'errorType': type(e).__name__,
            'traceback': traceback.format_exc()
        }))
        
        return {'statusCode': 500, 'body': json.dumps({'error': str(e)})}

def process_action(user_id, action):
    return {'userId': user_id, 'action': action, 'processed': True}

CloudWatch Logs Insights can query these logs:

fields @timestamp, requestId, status, error
| filter status = "error"
| stats count() by errorType

This returns counts of errors grouped by type.

CloudWatch Metrics and Alarms

Lambda automatically publishes metrics to CloudWatch:

Invocations: Total number of invocations
Duration: Execution time in milliseconds
Errors: Count of errors
Throttles: Concurrent execution limits exceeded
ConcurrentExecutions: Number of running instances
UnreservedConcurrentExecutions: Available concurrency

Create custom metrics in your function:

import json
import boto3
import time

cloudwatch = boto3.client('cloudwatch')

def lambda_handler(event, context):
    start = time.time()
    
    try:
        # Your business logic
        result = process_request(event)
        duration = (time.time() - start) * 1000  # Convert to ms
        
        # Publish custom metric
        cloudwatch.put_metric_data(
            Namespace='MyServerlessApp',
            MetricData=[
                {
                    'MetricName': 'ProcessingTime',
                    'Value': duration,
                    'Unit': 'Milliseconds',
                    'Timestamp': datetime.utcnow()
                },
                {
                    'MetricName': 'SuccessfulRequests',
                    'Value': 1,
                    'Unit': 'Count'
                }
            ]
        )
        
        return {'statusCode': 200, 'body': json.dumps(result)}
    
    except Exception as e:
        cloudwatch.put_metric_data(
            Namespace='MyServerlessApp',
            MetricData=[
                {
                    'MetricName': 'FailedRequests',
                    'Value': 1,
                    'Unit': 'Count'
                }
            ]
        )
        raise

def process_request(event):
    return {'status': 'processed'}

Create an alarm to notify on errors:

aws cloudwatch put-metric-alarm \
  --alarm-name lambda-high-error-rate \
  --alarm-description "Alert if Lambda error rate exceeds 5%" \
  --metric-name Errors \
  --namespace AWS/Lambda \
  --statistic Sum \
  --period 300 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=FunctionName,Value=my-function \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts

X-Ray Tracing for Distributed Requests

X-Ray traces requests across AWS services, showing performance bottlenecks. Enable X-Ray in your function:

# SAM template
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Tracing: Active  # Enable X-Ray tracing
      Policies:
        - Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Action:
                - xray:PutTraceSegments
                - xray:PutTelemetryRecords
              Resource: '*'

Or via CLI:

aws lambda update-function-configuration \
  --function-name my-function \
  --tracing-config Mode=Active

Instrument your code:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
import boto3

# Patch AWS SDK for X-Ray
patch_all()

s3 = boto3.client('s3')

def lambda_handler(event, context):
    # Add custom subsegment
    with xray_recorder.capture('process_file'):
        bucket = event['bucket']
        key = event['key']
        
        # This S3 call is automatically traced
        obj = s3.get_object(Bucket=bucket, Key=key)
        data = obj['Body'].read()
        
        # Add annotations (indexable metadata)
        xray_recorder.put_annotation('bucket', bucket)
        xray_recorder.put_annotation('key', key)
        
        # Add metadata (non-indexed)
        xray_recorder.put_metadata('file_size', len(data))
        
        result = process_data(data)
    
    return {'statusCode': 200, 'result': result}

@xray_recorder.capture('process_data')
def process_data(data):
    # Automatically captured as subsegment
    return len(data)

View traces in the X-Ray Console:

Go to X-Ray → Service Map
See all services involved in request
Click a service to view individual traces
See timing breakdown: how long in Lambda, S3, DynamoDB, etc.

Lambda Insights CloudWatch Agent

Lambda Insights provides enhanced monitoring with pre-built dashboards. Install the Lambda Insights extension:

# SAM template
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Layers:
        - !Sub 'arn:aws:lambda:${AWS::Region}:580254703988:layer:LambdaInsightsExtension:${LambdaInsightsVersion}'
      Policies:
        - Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Action:
                - logs:PutSubscriptionFilter
                - logs:DeleteSubscriptionFilter
              Resource: '*'

Or via CLI:

aws lambda update-function-configuration \
  --function-name my-function \
  --layers "arn:aws:lambda:us-east-1:580254703988:layer:LambdaInsightsExtension:21"

The Lambda Insights layer automatically collects:

Memory utilization
CPU usage
Disk I/O
Network I/O
Duration
Cold starts

View the dashboard in CloudWatch Logs Insights.

Debugging Lambda Errors

Common error patterns and solutions:

Timeout Error:

2026-06-02T10:30:05.000Z END RequestId: ...
Task timed out after 30.00 seconds

Solution: Increase timeout or optimize code:

aws lambda update-function-configuration \
  --function-name my-function \
  --timeout 60

OutOfMemory Error:

fatal error: runtime: out of memory

Solution: Increase memory allocation:

aws lambda update-function-configuration \
  --function-name my-function \
  --memory-size 512

Missing Dependency:

ModuleNotFoundError: No module named 'requests'

Solution: Add dependency to Lambda Layer or function ZIP file (see Lambda Layers article).

Permission Denied:

An error occurred (AccessDenied) when calling the GetObject operation

Solution: Add IAM permission to execution role:

Policies:
  - Version: '2012-10-17'
    Statement:
      - Effect: Allow
        Action:
          - s3:GetObject
        Resource: 'arn:aws:s3:::my-bucket/*'

Key Takeaways

CloudWatch Logs automatically capture all function output and errors; view in the Lambda Console or Logs Insights.
Structured logging (JSON) enables querying and aggregation across invocations.
CloudWatch Metrics track invocations, duration, errors; create alarms to notify on anomalies.
X-Ray tracing shows request flow across AWS services, identifying performance bottlenecks.
Lambda Insights provides enhanced dashboards with memory, CPU, and I/O metrics.
Common errors (timeout, OutOfMemory, missing dependencies) have clear solutions in logs.

Frequently Asked Questions

How long are CloudWatch Logs retained?

By default, indefinitely. You can set retention policies: 1 day to 10 years. Longer retention increases storage costs. Typical production functions use 30–90 day retention.

How do I track a request across multiple Lambda functions?

Use the X-Amzn-Trace-Id header (automatically set by Lambda) or a custom request ID. Pass it through events between functions. X-Ray automatically correlates traces by trace ID.

Can I search CloudWatch Logs efficiently with millions of entries?

Yes, use CloudWatch Logs Insights. It's optimized for querying large log volumes:

fields @duration, @memoryUsed
| filter @duration > 1000
| stats avg(@duration), max(@duration) by @functionName

This returns average and max duration by function in seconds.

What's the cost of CloudWatch and X-Ray?

CloudWatch Logs: $0.50 per GB ingested, $0.03 per GB stored. X-Ray: $0.50 per million traced requests. For typical applications, costs are under $10/month.

Can I disable CloudWatch Logs to reduce costs?

Not fully—Lambda always logs to CloudWatch. However, use log retention policies to delete old logs automatically. For non-critical functions, reduce logging verbosity.

CloudWatch Logs Basics​

Structured Logging for Production​

CloudWatch Metrics and Alarms​

X-Ray Tracing for Distributed Requests​

Lambda Insights CloudWatch Agent​

Debugging Lambda Errors​

Key Takeaways​

Frequently Asked Questions​

How long are CloudWatch Logs retained?​

How do I track a request across multiple Lambda functions?​

Can I search CloudWatch Logs efficiently with millions of entries?​

What's the cost of CloudWatch and X-Ray?​

Can I disable CloudWatch Logs to reduce costs?​

Further Reading​