Production GraphQL: Security & Performance
Production GraphQL requires defense against query complexity attacks, algorithm slowdowns, and data leaks. This article covers the security and performance patterns that transform a development API into a hardened, high-throughput service: query complexity limits, rate limiting, introspection controls, query caching, and monitoring. By the end, you'll have a checklist for deploying safely.
In 2023, I deployed a GraphQL API without complexity limits. An attacker discovered they could request deeply nested queries—100 levels deep, millions of fields—causing the server to hang. That one incident taught me that production GraphQL requires explicit safety guards. This article codifies those lessons.
Query Complexity: The Core Attack Vector
An attacker can craft a query that requests the same field millions of times or explores infinite recursion in circular types:
# Infinite recursion attack: user → posts → author → posts → ...
query {
user(id: 1) {
posts {
author {
posts {
author {
posts {
author { ... }
}
}
}
}
}
}
}
# Exponential field explosion attack.
query {
user(id: 1) {
posts {
author { name email }
comments { author { name email } }
tags { name }
metadata { ... }
}
}
}
These queries consume massive CPU and memory, crashing the server or causing timeouts. Mitigation: implement query complexity analysis.
Query Complexity Limiting
Strawberry doesn't include built-in complexity limiting, but you can implement it with a validation hook:
import strawberry
from strawberry.schema import validate_schema
from typing import Any
class ComplexityAnalyzer:
"""Analyze query complexity (count fields, depth, etc.)."""
def __init__(self, max_complexity: int = 100):
self.max_complexity = max_complexity
def analyze(self, query_ast: Any) -> int:
"""Return the complexity score of a query."""
# Simplified: count the number of fields in the query.
# In production, use graphql-core's validation rules.
return self._count_fields(query_ast)
def _count_fields(self, node: Any) -> int:
"""Recursively count fields."""
if not hasattr(node, 'selection_set') or not node.selection_set:
return 1
total = 0
for selection in node.selection_set.selections:
total += self._count_fields(selection)
return total + 1
# Middleware to validate complexity before execution.
def validate_query_complexity(query_string: str, schema: strawberry.Schema):
"""Validate query complexity before execution."""
from graphql import parse
try:
ast = parse(query_string)
except Exception:
raise strawberry.errors.GraphQLError("Invalid query")
analyzer = ComplexityAnalyzer(max_complexity=100)
complexity = analyzer.analyze(ast)
if complexity > analyzer.max_complexity:
raise strawberry.errors.GraphQLError(
f"Query complexity {complexity} exceeds limit {analyzer.max_complexity}",
extensions={"code": "QUERY_TOO_COMPLEX"}
)
In production, use the graphql-core validation system directly:
from graphql import graphql_sync, validate_schema, parse
from graphql.validation import QueryComplexityValidator
query_string = """
query {
user(id: 1) {
posts { title }
}
}
"""
schema_def = schema.as_str() # Strawberry exports to GraphQL schema.
ast = parse(query_string)
# Validate with complexity rules.
errors = validate_schema(ast, schema_def)
if errors:
raise strawberry.errors.GraphQLError(str(errors))
Depth Limiting
Limit query nesting depth to prevent infinite recursion:
import strawberry
class DepthLimiter:
"""Limit query depth."""
def __init__(self, max_depth: int = 10):
self.max_depth = max_depth
def validate(self, query_ast: Any) -> bool:
"""Return True if depth is within limit."""
depth = self._calculate_depth(query_ast)
return depth <= self.max_depth
def _calculate_depth(self, node: Any, current_depth: int = 0) -> int:
"""Recursively calculate depth."""
if not hasattr(node, 'selection_set') or not node.selection_set:
return current_depth
max_child_depth = current_depth
for selection in node.selection_set.selections:
child_depth = self._calculate_depth(selection, current_depth + 1)
max_child_depth = max(max_child_depth, child_depth)
return max_child_depth
Rate Limiting and Throttling
Rate limit by user ID or IP address to prevent abuse:
from slowapi import Limiter
from slowapi.util import get_remote_address
import redis
# Redis-backed rate limiter (scales across multiple servers).
redis_client = redis.Redis(host='localhost', port=6379)
class GraphQLRateLimiter:
"""Rate limit GraphQL requests by user."""
def __init__(self, rpm_per_user: int = 60):
self.rpm_per_user = rpm_per_user # Requests per minute.
async def check_limit(self, user_id: Optional[int], ip: str) -> bool:
"""Check if user is within rate limit."""
# Use user_id if authenticated; fall back to IP.
key = f"graphql:{user_id or ip}"
count = redis_client.incr(key)
if count == 1:
redis_client.expire(key, 60) # Reset every minute.
return count <= self.rpm_per_user
# Apply in FastAPI middleware.
from fastapi import FastAPI, Request
app = FastAPI()
limiter = GraphQLRateLimiter(rpm_per_user=60)
@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
if request.url.path == "/graphql" and request.method == "POST":
user_id = extract_user_id(request)
client_ip = request.client.host
if not await limiter.check_limit(user_id, client_ip):
return JSONResponse(
status_code=429,
content={"error": "Rate limit exceeded"}
)
return await call_next(request)
Disable Introspection in Production
GraphQL introspection allows clients to discover your entire schema. Disable it in production to prevent reconnaissance:
from strawberry.schema import BaseSchema
import strawberry
class ProductionSchema(BaseSchema):
"""Schema with introspection disabled."""
def __init__(self, *args, introspection_enabled: bool = False, **kwargs):
super().__init__(*args, **kwargs)
self.introspection_enabled = introspection_enabled
schema = ProductionSchema(
query=Query,
mutation=Mutation,
subscription=Subscription,
introspection_enabled=False # Disable in production.
)
Alternatively, reject introspection queries in middleware:
@app.middleware("http")
async def disable_introspection(request: Request, call_next):
if request.url.path == "/graphql":
try:
body = await request.json()
query = body.get("query", "")
if "__schema" in query or "__type" in query:
return JSONResponse(
status_code=400,
content={"error": "Introspection disabled"}
)
except:
pass
return await call_next(request)
Query Whitelisting and Persisted Queries
In high-security environments, only allow predefined queries:
import hashlib
# Map query hashes to allowed queries.
ALLOWED_QUERIES = {
hashlib.sha256(b"query { user { id name } }").hexdigest(): "GetUser",
hashlib.sha256(b"query { posts { id title } }").hexdigest(): "GetPosts",
}
@app.post("/graphql")
async def graphql_endpoint(request: Request):
body = await request.json()
# Check if query is whitelisted.
query = body.get("query", "")
query_hash = hashlib.sha256(query.encode()).hexdigest()
if query_hash not in ALLOWED_QUERIES:
return JSONResponse(
status_code=400,
content={"error": "Query not whitelisted"}
)
# Execute the query...
Persisted queries prevent clients from sending arbitrary queries—only hashes. This reduces bandwidth and improves security.
Query Caching and Automatic Persisted Queries (APQ)
Cache query results to reduce server load:
import hashlib
import redis
redis_client = redis.Redis(host='localhost', port=6379)
async def execute_graphql(query: str, variables: dict) -> dict:
"""Execute a GraphQL query with caching."""
# Generate cache key from query + variables.
cache_key = f"graphql:{hashlib.md5((query + str(variables)).encode()).hexdigest()}"
# Check cache.
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
# Execute query (if not cached).
result = await schema.execute(query, variable_values=variables)
# Cache for 5 minutes.
redis_client.setex(cache_key, 300, json.dumps(result))
return result
Automatic Persisted Queries (APQ) from Apollo compress queries to hashes:
@app.post("/graphql")
async def graphql_endpoint(request: Request):
body = await request.json()
# If extensions.persistedQuery exists, it's an APQ request.
ext = body.get("extensions", {})
if ext.get("persistedQuery"):
query_hash = ext["persistedQuery"].get("sha256Hash")
# Lookup persisted query from cache.
cached_query = redis_client.get(f"apq:{query_hash}")
if not cached_query:
return JSONResponse(
status_code=400,
content={"error": "Persisted query not found"}
)
body["query"] = cached_query
# Execute...
Monitoring and Logging
Log all GraphQL operations for debugging and security auditing:
import json
import logging
from datetime import datetime
logger = logging.getLogger(__name__)
@app.post("/graphql")
async def graphql_endpoint(request: Request):
start = datetime.utcnow()
body = await request.json()
# Execute query.
result = await schema.execute(body["query"], variable_values=body.get("variables"))
duration_ms = (datetime.utcnow() - start).total_seconds() * 1000
# Log operation.
logger.info(json.dumps({
"operation": "graphql",
"query": body.get("query", "")[:100], # First 100 chars.
"user_id": extract_user_id(request),
"status": "success" if not result.errors else "error",
"duration_ms": duration_ms,
"errors": [str(e) for e in (result.errors or [])],
}))
return result
Monitor key metrics:
- Query latency: Slow queries indicate N+1 or missing indexes.
- Error rate: Spike in errors suggests an attack or bug.
- Query complexity: Track which queries are most expensive.
- User activity: Identify unusual patterns (user hammering an endpoint).
Security Checklist for Production Deployment
Before deploying to production, verify:
- Authentication: All sensitive queries/mutations require valid JWT tokens. ✓
- Authorization: Field-level and mutation-level permissions are enforced. ✓
- Query limits: Complexity, depth, and rate limiting are in place. ✓
- Introspection: Disabled in production. ✓
- Input validation: All inputs (strings, numbers, enums) are validated. ✓
- Error handling: Errors don't leak sensitive data (stack traces, database queries). ✓
- HTTPS: GraphQL endpoints use TLS (https://, wss://). ✓
- CORS: Only allowed origins can access the API. ✓
- Logging: All operations are logged for auditing. ✓
- Monitoring: Alerts on latency spikes, error rates, or unusual patterns. ✓
- Database security: DB user has minimal required permissions; SQL injection is impossible (use parameterized queries). ✓
- Secrets: API keys, JWT secrets, database passwords use environment variables (never hardcoded). ✓
Performance Tuning Checklist
For high-throughput deployments:
- DataLoaders: Eliminate N+1 queries with batch loading. ✓
- Connection pooling: Reuse database connections. ✓
- Async/await: All I/O operations are async. ✓
- Caching: Frequently accessed data is cached (Redis, in-memory). ✓
- Indexes: Database tables have indexes on frequently queried fields. ✓
- Query analysis: Use
EXPLAINto find slow queries. ✓ - Instrumentation: Use APM tools (DataDog, New Relic) to track performance. ✓
- Load testing: Verify the API handles expected traffic (Apache JMeter, k6). ✓
Key Takeaways
- Query complexity attacks can crash servers; implement complexity and depth limits.
- Rate limiting prevents abuse; use Redis to scale across multiple servers.
- Disable introspection in production to prevent schema reconnaissance.
- Persisted queries reduce bandwidth and improve security.
- Cache query results and use APQ for high-traffic APIs.
- Log all operations and monitor key metrics (latency, errors, complexity).
- Follow the security and performance checklists before production deployment.
Frequently Asked Questions
How do I measure query complexity in production?
Log the query AST size (number of fields), execution time, and database query count. Identify slow queries and optimize them with indexes, DataLoaders, or caching.
What's a reasonable complexity limit?
Start with 100-1000 depending on your schema. Adjust based on legitimate client queries. A well-designed schema shouldn't require thousands.
Should I use GraphQL subscriptions over WebSocket in production?
Yes, but with caution: WebSocket connections are long-lived and consume memory. Use a message broker (Redis, RabbitMQ) to scale subscriptions across multiple servers.
How do I handle GraphQL downtime or version upgrades?
Use feature flags to disable problematic queries temporarily. Version your schema (v1, v2) to support old clients during upgrades. Use GraphQL composition tools (Apollo Federation) to split schemas across services.
Can I use GraphQL without databases (e.g., REST proxy)?
Yes. A GraphQL layer can proxy to REST APIs. Use DataLoaders to batch REST calls. This is common for legacy system integration.