Monitor & Debug Lambda: CloudWatch & X-Ray
Observability—the ability to understand what's happening inside running systems—is critical for production Lambda functions. CloudWatch Logs capture function output and errors; CloudWatch Metrics track invocation counts, duration, and errors; X-Ray provides distributed tracing for requests across services. Together, they enable you to diagnose issues, optimize performance, and ensure reliability.
CloudWatch Logs Basics
Lambda automatically sends all print statements and exceptions to CloudWatch Logs. Access logs in the Lambda Console:
- Select a function
- Click Monitor → Logs
- Click a log stream to view invocation details
Each invocation creates a log stream with a timestamp and request ID. Example log output:
START RequestId: abc-123-def-456 Version: $LATEST
2026-06-02T10:30:00.123Z abc-123-def-456 Processing event: {'userId': 42, 'action': 'login'}
2026-06-02T10:30:00.456Z abc-123-def-456 Database query took 0.213s
2026-06-02T10:30:00.789Z abc-123-def-456 END RequestId: abc-123-def-456
REPORT RequestId: abc-123-def-456 Duration: 666.14 ms Billed Duration: 667 ms Memory Used: 85 MB Init Duration: 45.23 ms XRAY TraceId: 1-abc-def
The REPORT line shows:
- Duration: How long the function executed
- Billed Duration: Rounded to nearest 1 ms (minimum 1,000 ms)
- Memory Used: Actual memory consumption
- Init Duration: Cold-start initialization time (only on first invocation)
- XRAY TraceId: Link to X-Ray trace (if enabled)
Structured Logging for Production
Use structured logging (JSON format) for easier parsing and querying:
import json
import logging
from datetime import datetime
# Configure JSON logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
request_id = context.aws_request_id
# Log structured event
logger.info(json.dumps({
'timestamp': datetime.utcnow().isoformat(),
'requestId': request_id,
'event': event,
'source': 'lambda_handler'
}))
try:
user_id = event['userId']
action = event['action']
# Simulate work
result = process_action(user_id, action)
# Log success
logger.info(json.dumps({
'timestamp': datetime.utcnow().isoformat(),
'requestId': request_id,
'status': 'success',
'result': result
}))
return {'statusCode': 200, 'body': json.dumps(result)}
except Exception as e:
# Log error with exception details
logger.error(json.dumps({
'timestamp': datetime.utcnow().isoformat(),
'requestId': request_id,
'error': str(e),
'errorType': type(e).__name__,
'traceback': traceback.format_exc()
}))
return {'statusCode': 500, 'body': json.dumps({'error': str(e)})}
def process_action(user_id, action):
return {'userId': user_id, 'action': action, 'processed': True}
CloudWatch Logs Insights can query these logs:
fields @timestamp, requestId, status, error
| filter status = "error"
| stats count() by errorType
This returns counts of errors grouped by type.
CloudWatch Metrics and Alarms
Lambda automatically publishes metrics to CloudWatch:
- Invocations: Total number of invocations
- Duration: Execution time in milliseconds
- Errors: Count of errors
- Throttles: Concurrent execution limits exceeded
- ConcurrentExecutions: Number of running instances
- UnreservedConcurrentExecutions: Available concurrency
Create custom metrics in your function:
import json
import boto3
import time
cloudwatch = boto3.client('cloudwatch')
def lambda_handler(event, context):
start = time.time()
try:
# Your business logic
result = process_request(event)
duration = (time.time() - start) * 1000 # Convert to ms
# Publish custom metric
cloudwatch.put_metric_data(
Namespace='MyServerlessApp',
MetricData=[
{
'MetricName': 'ProcessingTime',
'Value': duration,
'Unit': 'Milliseconds',
'Timestamp': datetime.utcnow()
},
{
'MetricName': 'SuccessfulRequests',
'Value': 1,
'Unit': 'Count'
}
]
)
return {'statusCode': 200, 'body': json.dumps(result)}
except Exception as e:
cloudwatch.put_metric_data(
Namespace='MyServerlessApp',
MetricData=[
{
'MetricName': 'FailedRequests',
'Value': 1,
'Unit': 'Count'
}
]
)
raise
def process_request(event):
return {'status': 'processed'}
Create an alarm to notify on errors:
aws cloudwatch put-metric-alarm \
--alarm-name lambda-high-error-rate \
--alarm-description "Alert if Lambda error rate exceeds 5%" \
--metric-name Errors \
--namespace AWS/Lambda \
--statistic Sum \
--period 300 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=FunctionName,Value=my-function \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
X-Ray Tracing for Distributed Requests
X-Ray traces requests across AWS services, showing performance bottlenecks. Enable X-Ray in your function:
# SAM template
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Tracing: Active # Enable X-Ray tracing
Policies:
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- xray:PutTraceSegments
- xray:PutTelemetryRecords
Resource: '*'
Or via CLI:
aws lambda update-function-configuration \
--function-name my-function \
--tracing-config Mode=Active
Instrument your code:
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
import boto3
# Patch AWS SDK for X-Ray
patch_all()
s3 = boto3.client('s3')
def lambda_handler(event, context):
# Add custom subsegment
with xray_recorder.capture('process_file'):
bucket = event['bucket']
key = event['key']
# This S3 call is automatically traced
obj = s3.get_object(Bucket=bucket, Key=key)
data = obj['Body'].read()
# Add annotations (indexable metadata)
xray_recorder.put_annotation('bucket', bucket)
xray_recorder.put_annotation('key', key)
# Add metadata (non-indexed)
xray_recorder.put_metadata('file_size', len(data))
result = process_data(data)
return {'statusCode': 200, 'result': result}
@xray_recorder.capture('process_data')
def process_data(data):
# Automatically captured as subsegment
return len(data)
View traces in the X-Ray Console:
- Go to X-Ray → Service Map
- See all services involved in request
- Click a service to view individual traces
- See timing breakdown: how long in Lambda, S3, DynamoDB, etc.
Lambda Insights CloudWatch Agent
Lambda Insights provides enhanced monitoring with pre-built dashboards. Install the Lambda Insights extension:
# SAM template
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Layers:
- !Sub 'arn:aws:lambda:${AWS::Region}:580254703988:layer:LambdaInsightsExtension:${LambdaInsightsVersion}'
Policies:
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:PutSubscriptionFilter
- logs:DeleteSubscriptionFilter
Resource: '*'
Or via CLI:
aws lambda update-function-configuration \
--function-name my-function \
--layers "arn:aws:lambda:us-east-1:580254703988:layer:LambdaInsightsExtension:21"
The Lambda Insights layer automatically collects:
- Memory utilization
- CPU usage
- Disk I/O
- Network I/O
- Duration
- Cold starts
View the dashboard in CloudWatch Logs Insights.
Debugging Lambda Errors
Common error patterns and solutions:
Timeout Error:
2026-06-02T10:30:05.000Z END RequestId: ...
Task timed out after 30.00 seconds
Solution: Increase timeout or optimize code:
aws lambda update-function-configuration \
--function-name my-function \
--timeout 60
OutOfMemory Error:
fatal error: runtime: out of memory
Solution: Increase memory allocation:
aws lambda update-function-configuration \
--function-name my-function \
--memory-size 512
Missing Dependency:
ModuleNotFoundError: No module named 'requests'
Solution: Add dependency to Lambda Layer or function ZIP file (see Lambda Layers article).
Permission Denied:
An error occurred (AccessDenied) when calling the GetObject operation
Solution: Add IAM permission to execution role:
Policies:
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:GetObject
Resource: 'arn:aws:s3:::my-bucket/*'
Key Takeaways
- CloudWatch Logs automatically capture all function output and errors; view in the Lambda Console or Logs Insights.
- Structured logging (JSON) enables querying and aggregation across invocations.
- CloudWatch Metrics track invocations, duration, errors; create alarms to notify on anomalies.
- X-Ray tracing shows request flow across AWS services, identifying performance bottlenecks.
- Lambda Insights provides enhanced dashboards with memory, CPU, and I/O metrics.
- Common errors (timeout, OutOfMemory, missing dependencies) have clear solutions in logs.
Frequently Asked Questions
How long are CloudWatch Logs retained?
By default, indefinitely. You can set retention policies: 1 day to 10 years. Longer retention increases storage costs. Typical production functions use 30–90 day retention.
How do I track a request across multiple Lambda functions?
Use the X-Amzn-Trace-Id header (automatically set by Lambda) or a custom request ID. Pass it through events between functions. X-Ray automatically correlates traces by trace ID.
Can I search CloudWatch Logs efficiently with millions of entries?
Yes, use CloudWatch Logs Insights. It's optimized for querying large log volumes:
fields @duration, @memoryUsed
| filter @duration > 1000
| stats avg(@duration), max(@duration) by @functionName
This returns average and max duration by function in seconds.
What's the cost of CloudWatch and X-Ray?
CloudWatch Logs: $0.50 per GB ingested, $0.03 per GB stored. X-Ray: $0.50 per million traced requests. For typical applications, costs are under $10/month.
Can I disable CloudWatch Logs to reduce costs?
Not fully—Lambda always logs to CloudWatch. However, use log retention policies to delete old logs automatically. For non-critical functions, reduce logging verbosity.
Further Reading
- CloudWatch Logs Insights Query Syntax — Complete query reference
- X-Ray Documentation — Tracing and service maps
- Lambda Monitoring Best Practices — AWS recommendations
- AWS X-Ray SDK for Python — X-Ray instrumentation library