Skip to main content

Monitor & Debug Lambda: CloudWatch & X-Ray

Observability—the ability to understand what's happening inside running systems—is critical for production Lambda functions. CloudWatch Logs capture function output and errors; CloudWatch Metrics track invocation counts, duration, and errors; X-Ray provides distributed tracing for requests across services. Together, they enable you to diagnose issues, optimize performance, and ensure reliability.

CloudWatch Logs Basics

Lambda automatically sends all print statements and exceptions to CloudWatch Logs. Access logs in the Lambda Console:

  1. Select a function
  2. Click MonitorLogs
  3. Click a log stream to view invocation details

Each invocation creates a log stream with a timestamp and request ID. Example log output:

START RequestId: abc-123-def-456 Version: $LATEST
2026-06-02T10:30:00.123Z abc-123-def-456 Processing event: {'userId': 42, 'action': 'login'}
2026-06-02T10:30:00.456Z abc-123-def-456 Database query took 0.213s
2026-06-02T10:30:00.789Z abc-123-def-456 END RequestId: abc-123-def-456
REPORT RequestId: abc-123-def-456 Duration: 666.14 ms Billed Duration: 667 ms Memory Used: 85 MB Init Duration: 45.23 ms XRAY TraceId: 1-abc-def

The REPORT line shows:

  • Duration: How long the function executed
  • Billed Duration: Rounded to nearest 1 ms (minimum 1,000 ms)
  • Memory Used: Actual memory consumption
  • Init Duration: Cold-start initialization time (only on first invocation)
  • XRAY TraceId: Link to X-Ray trace (if enabled)

Structured Logging for Production

Use structured logging (JSON format) for easier parsing and querying:

import json
import logging
from datetime import datetime

# Configure JSON logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
request_id = context.aws_request_id

# Log structured event
logger.info(json.dumps({
'timestamp': datetime.utcnow().isoformat(),
'requestId': request_id,
'event': event,
'source': 'lambda_handler'
}))

try:
user_id = event['userId']
action = event['action']

# Simulate work
result = process_action(user_id, action)

# Log success
logger.info(json.dumps({
'timestamp': datetime.utcnow().isoformat(),
'requestId': request_id,
'status': 'success',
'result': result
}))

return {'statusCode': 200, 'body': json.dumps(result)}

except Exception as e:
# Log error with exception details
logger.error(json.dumps({
'timestamp': datetime.utcnow().isoformat(),
'requestId': request_id,
'error': str(e),
'errorType': type(e).__name__,
'traceback': traceback.format_exc()
}))

return {'statusCode': 500, 'body': json.dumps({'error': str(e)})}

def process_action(user_id, action):
return {'userId': user_id, 'action': action, 'processed': True}

CloudWatch Logs Insights can query these logs:

fields @timestamp, requestId, status, error
| filter status = "error"
| stats count() by errorType

This returns counts of errors grouped by type.

CloudWatch Metrics and Alarms

Lambda automatically publishes metrics to CloudWatch:

  • Invocations: Total number of invocations
  • Duration: Execution time in milliseconds
  • Errors: Count of errors
  • Throttles: Concurrent execution limits exceeded
  • ConcurrentExecutions: Number of running instances
  • UnreservedConcurrentExecutions: Available concurrency

Create custom metrics in your function:

import json
import boto3
import time

cloudwatch = boto3.client('cloudwatch')

def lambda_handler(event, context):
start = time.time()

try:
# Your business logic
result = process_request(event)
duration = (time.time() - start) * 1000 # Convert to ms

# Publish custom metric
cloudwatch.put_metric_data(
Namespace='MyServerlessApp',
MetricData=[
{
'MetricName': 'ProcessingTime',
'Value': duration,
'Unit': 'Milliseconds',
'Timestamp': datetime.utcnow()
},
{
'MetricName': 'SuccessfulRequests',
'Value': 1,
'Unit': 'Count'
}
]
)

return {'statusCode': 200, 'body': json.dumps(result)}

except Exception as e:
cloudwatch.put_metric_data(
Namespace='MyServerlessApp',
MetricData=[
{
'MetricName': 'FailedRequests',
'Value': 1,
'Unit': 'Count'
}
]
)
raise

def process_request(event):
return {'status': 'processed'}

Create an alarm to notify on errors:

aws cloudwatch put-metric-alarm \
--alarm-name lambda-high-error-rate \
--alarm-description "Alert if Lambda error rate exceeds 5%" \
--metric-name Errors \
--namespace AWS/Lambda \
--statistic Sum \
--period 300 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=FunctionName,Value=my-function \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts

X-Ray Tracing for Distributed Requests

X-Ray traces requests across AWS services, showing performance bottlenecks. Enable X-Ray in your function:

# SAM template
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Tracing: Active # Enable X-Ray tracing
Policies:
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- xray:PutTraceSegments
- xray:PutTelemetryRecords
Resource: '*'

Or via CLI:

aws lambda update-function-configuration \
--function-name my-function \
--tracing-config Mode=Active

Instrument your code:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
import boto3

# Patch AWS SDK for X-Ray
patch_all()

s3 = boto3.client('s3')

def lambda_handler(event, context):
# Add custom subsegment
with xray_recorder.capture('process_file'):
bucket = event['bucket']
key = event['key']

# This S3 call is automatically traced
obj = s3.get_object(Bucket=bucket, Key=key)
data = obj['Body'].read()

# Add annotations (indexable metadata)
xray_recorder.put_annotation('bucket', bucket)
xray_recorder.put_annotation('key', key)

# Add metadata (non-indexed)
xray_recorder.put_metadata('file_size', len(data))

result = process_data(data)

return {'statusCode': 200, 'result': result}

@xray_recorder.capture('process_data')
def process_data(data):
# Automatically captured as subsegment
return len(data)

View traces in the X-Ray Console:

  1. Go to X-RayService Map
  2. See all services involved in request
  3. Click a service to view individual traces
  4. See timing breakdown: how long in Lambda, S3, DynamoDB, etc.

Lambda Insights CloudWatch Agent

Lambda Insights provides enhanced monitoring with pre-built dashboards. Install the Lambda Insights extension:

# SAM template
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Layers:
- !Sub 'arn:aws:lambda:${AWS::Region}:580254703988:layer:LambdaInsightsExtension:${LambdaInsightsVersion}'
Policies:
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:PutSubscriptionFilter
- logs:DeleteSubscriptionFilter
Resource: '*'

Or via CLI:

aws lambda update-function-configuration \
--function-name my-function \
--layers "arn:aws:lambda:us-east-1:580254703988:layer:LambdaInsightsExtension:21"

The Lambda Insights layer automatically collects:

  • Memory utilization
  • CPU usage
  • Disk I/O
  • Network I/O
  • Duration
  • Cold starts

View the dashboard in CloudWatch Logs Insights.

Debugging Lambda Errors

Common error patterns and solutions:

Timeout Error:

2026-06-02T10:30:05.000Z END RequestId: ...
Task timed out after 30.00 seconds

Solution: Increase timeout or optimize code:

aws lambda update-function-configuration \
--function-name my-function \
--timeout 60

OutOfMemory Error:

fatal error: runtime: out of memory

Solution: Increase memory allocation:

aws lambda update-function-configuration \
--function-name my-function \
--memory-size 512

Missing Dependency:

ModuleNotFoundError: No module named 'requests'

Solution: Add dependency to Lambda Layer or function ZIP file (see Lambda Layers article).

Permission Denied:

An error occurred (AccessDenied) when calling the GetObject operation

Solution: Add IAM permission to execution role:

Policies:
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:GetObject
Resource: 'arn:aws:s3:::my-bucket/*'

Key Takeaways

  • CloudWatch Logs automatically capture all function output and errors; view in the Lambda Console or Logs Insights.
  • Structured logging (JSON) enables querying and aggregation across invocations.
  • CloudWatch Metrics track invocations, duration, errors; create alarms to notify on anomalies.
  • X-Ray tracing shows request flow across AWS services, identifying performance bottlenecks.
  • Lambda Insights provides enhanced dashboards with memory, CPU, and I/O metrics.
  • Common errors (timeout, OutOfMemory, missing dependencies) have clear solutions in logs.

Frequently Asked Questions

How long are CloudWatch Logs retained?

By default, indefinitely. You can set retention policies: 1 day to 10 years. Longer retention increases storage costs. Typical production functions use 30–90 day retention.

How do I track a request across multiple Lambda functions?

Use the X-Amzn-Trace-Id header (automatically set by Lambda) or a custom request ID. Pass it through events between functions. X-Ray automatically correlates traces by trace ID.

Can I search CloudWatch Logs efficiently with millions of entries?

Yes, use CloudWatch Logs Insights. It's optimized for querying large log volumes:

fields @duration, @memoryUsed
| filter @duration > 1000
| stats avg(@duration), max(@duration) by @functionName

This returns average and max duration by function in seconds.

What's the cost of CloudWatch and X-Ray?

CloudWatch Logs: $0.50 per GB ingested, $0.03 per GB stored. X-Ray: $0.50 per million traced requests. For typical applications, costs are under $10/month.

Can I disable CloudWatch Logs to reduce costs?

Not fully—Lambda always logs to CloudWatch. However, use log retention policies to delete old logs automatically. For non-critical functions, reduce logging verbosity.

Further Reading