Skip to main content

Error Handling and Retry Logic

Network calls fail unpredictably: servers go down, requests timeout, rate limits trigger, authentication lapses. A robust LLM application gracefully handles these failures, retries recoverable errors, and informs users of persistent problems. OpenAI's API returns specific error codes; understanding them and implementing appropriate retry strategies is the difference between a fragile prototype and a production-grade system.

Understanding OpenAI API Errors

The OpenAI API returns structured error responses with codes, messages, and retry hints. Common error types include:

ErrorCauseRetry?
RateLimitError (429)Request rate or token quota exceededYes (exponential backoff)
APIConnectionErrorNetwork failure, DNS, or connection timeoutYes (backoff)
APITimeoutErrorServer took too long to respondYes (backoff)
AuthenticationError (401)Invalid or expired API keyNo (user action required)
PermissionError (403)API key lacks access to model/featureNo (account/permission issue)
NotFoundError (404)Model not found or resource does not existNo (request is invalid)
ConflictError (409)Request conflicts with existing stateNo (user must resolve)
BadRequestError (400)Malformed request (invalid JSON, etc.)No (fix the request)
InternalServerError (5xx)Server errorYes (backoff)

The OpenAI Python client raises typed exceptions for each error. Catch them specifically and retry only recoverable errors.

Basic Error Handling

Wrap API calls in try-except blocks to catch specific errors:

from openai import OpenAI, APIError, RateLimitError, APITimeoutError, AuthenticationError

client = OpenAI()

try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
timeout=10 # 10-second timeout
)
print(response.choices[0].message.content)

except AuthenticationError as e:
print(f"Authentication failed: {e}. Check your API key.")
# Do not retry; user must fix the key

except RateLimitError as e:
print(f"Rate limited: {e}. Will retry in 60 seconds.")
# Retry is appropriate

except APITimeoutError as e:
print(f"Request timed out: {e}. Network may be slow.")
# Retry is appropriate

except APIError as e:
print(f"API error: {e}")
# Retry if it is a 5xx server error; otherwise, probably user's fault

The timeout parameter sets a hard limit on request duration. If exceeded, an APITimeoutError is raised. Set this to prevent indefinite hangs.

Exponential Backoff Retry Pattern

Exponential backoff automatically retries with increasing delays. This is the OpenAI-recommended approach for transient failures:

import time
import random
from openai import OpenAI, APIError, RateLimitError, APITimeoutError

client = OpenAI()

def api_call_with_exponential_backoff(
messages,
model="gpt-4o-mini",
max_retries=5,
base_wait_seconds=1,
max_wait_seconds=60
):
"""Make an API call with exponential backoff retry."""

for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model,
messages=messages,
timeout=30
)
return response

except (RateLimitError, APITimeoutError, APIError) as e:
# Check if it is a server error (5xx) or rate limit
is_retryable = (
isinstance(e, RateLimitError) or
isinstance(e, APITimeoutError) or
(isinstance(e, APIError) and getattr(e, 'status_code', 0) >= 500)
)

if not is_retryable or attempt == max_retries - 1:
raise

# Calculate wait time with jitter to avoid thundering herd
wait_time = min(
base_wait_seconds * (2 ** attempt) + random.uniform(0, 1),
max_wait_seconds
)

print(f"Attempt {attempt + 1} failed: {type(e).__name__}. "
f"Retrying in {wait_time:.1f} seconds...")
time.sleep(wait_time)

raise RuntimeError("All retries exhausted")

# Usage
messages = [{"role": "user", "content": "What is machine learning?"}]

try:
response = api_call_with_exponential_backoff(messages)
print(f"Success: {response.choices[0].message.content[:100]}")
except Exception as e:
print(f"Failed after retries: {e}")

Exponential backoff with jitter (random delay) prevents the "thundering herd" problem where all clients retry at the same moment. Retries start at 1 second, then 2, 4, 8, 16 seconds (capped at 60). The random jitter spreads retry attempts.

Timeout Handling

Timeouts prevent indefinite hangs. Set them appropriately: too short causes spurious failures; too long wastes user time:

from openai import OpenAI, APITimeoutError
import time

client = OpenAI()

messages = [{"role": "user", "content": "Write a long story."}]

# Short timeout (3 seconds) — likely to trigger for long responses
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
timeout=3,
max_tokens=2000
)
except APITimeoutError:
print("Request timed out with 3-second limit. Trying with longer timeout...")

# Reasonable timeout (30 seconds)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
timeout=30,
max_tokens=2000
)
print(response.choices[0].message.content[:100])

For streaming responses, timeouts apply to the entire stream duration, not per-token. A 10-second timeout is reasonable for most requests.

Circuit Breaker Pattern

For high-volume applications, a circuit breaker prevents repeated requests to a failing API. It tracks consecutive failures and temporarily stops requests if they exceed a threshold:

import time
from enum import Enum
from openai import OpenAI, APIError

class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Stop making requests
HALF_OPEN = "half_open" # Testing if service recovered

class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.state = CircuitState.CLOSED
self.failure_count = 0
self.last_failure_time = None

def is_available(self):
"""Check if the circuit allows requests."""
if self.state == CircuitState.CLOSED:
return True

if self.state == CircuitState.OPEN:
# Check if recovery timeout has elapsed
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
self.failure_count = 0
return True # Allow one test request
return False

# HALF_OPEN: allow request to test recovery
return True

def record_success(self):
"""Record successful request."""
self.failure_count = 0
self.state = CircuitState.CLOSED

def record_failure(self):
"""Record failed request."""
self.failure_count += 1
self.last_failure_time = time.time()

if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
print(f"Circuit opened after {self.failure_count} failures")

# Usage
breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=30)
client = OpenAI()

messages = [{"role": "user", "content": "Hello!"}]

if not breaker.is_available():
print("Circuit is open; skipping request")
else:
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
timeout=10
)
breaker.record_success()
print(response.choices[0].message.content)
except APIError as e:
breaker.record_failure()
print(f"Request failed. Circuit state: {breaker.state.value}")

The circuit breaker stops hammering a failing service after N consecutive failures. After a timeout, it allows one test request to see if the service recovered.

Validation and Input Sanitization

Prevent errors by validating inputs before sending them to the API:

from openai import OpenAI

client = OpenAI()

def validate_messages(messages):
"""Validate message structure before sending to API."""
if not isinstance(messages, list) or len(messages) == 0:
raise ValueError("Messages must be a non-empty list")

valid_roles = {"system", "user", "assistant"}
for msg in messages:
if not isinstance(msg, dict):
raise ValueError("Each message must be a dict")
if "role" not in msg or "content" not in msg:
raise ValueError("Each message must have 'role' and 'content'")
if msg["role"] not in valid_roles:
raise ValueError(f"Invalid role: {msg['role']}")
if not isinstance(msg["content"], str):
raise ValueError("Message content must be a string")

return True

# Validate before API call
try:
messages = [
{"role": "user", "content": "Hello!"}
]
validate_messages(messages)

response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
except ValueError as e:
print(f"Invalid input: {e}")
except Exception as e:
print(f"API error: {e}")

Validation catches schema errors early, reducing unnecessary API calls and cost.

Key Takeaways

  • Categorize errors: some are retryable (rate limits, timeouts, 5xx), others are not (auth, 4xx).
  • Use exponential backoff with jitter for retryable errors.
  • Set appropriate timeouts (20–30 seconds typical) to prevent hangs.
  • Implement circuit breakers to stop hammering failing services.
  • Validate inputs before sending to the API.
  • Always log errors with context (error code, message, retry count) for debugging.

Frequently Asked Questions

How many times should I retry?

Typically 3–5 retries with exponential backoff. After 5 attempts spanning 1 + 2 + 4 + 8 + 16 = 31 seconds, give up. Most transient failures resolve within the first few retries.

Should I retry on 400 Bad Request?

No. A 400 indicates a client error (malformed request, invalid JSON, wrong model name). Retrying will not fix it; instead, log the error and fix the request code.

What timeout should I use?

For typical requests, 20–30 seconds is reasonable. For streaming responses that return tokens gradually, 30–60 seconds is better. For background jobs with no real-time constraint, you can omit the timeout.

Can I cancel a request mid-transmission?

Yes, in Python you can break out of a streaming loop or use a background task cancellation mechanism. If a streaming response times out, the connection is closed and no further tokens are received.

How do I differentiate transient from permanent errors?

Transient errors (network, timeout, 5xx) improve if you retry. Permanent errors (auth, 4xx) persist. If the same request fails after 3 retries, it is permanent; log it and alert the user.

Further Reading