Skip to main content

Agent Loops: Thinking and Acting Explained

An agent loop is the core control mechanism that governs how an AI agent reasons and acts. Unlike a single LLM call that generates text once, an agent loop is iterative: the model thinks (analyzes context, plans), acts (calls tools), observes results, and loops until the goal is solved or a limit is reached. Understanding the loop's phases, stop conditions, and state management is essential for building agents that reliably solve complex multi-step problems without getting stuck or running away with costs.

The fundamental cycle alternates between two states: a thinking state (the model is deciding what to do) and an acting state (the agent is executing tool calls). The loop terminates when the model outputs a final answer (no more tool calls) or when a resource limit (steps, tokens, time) is reached.

The Five Phases of an Agent Loop

Each iteration of the agent loop follows this sequence:

Phase 1: Observe. The agent gathers the current state: the user's original request, the history of all previous tool calls and their results, any constraints or context, and available tools. This becomes the input to the model.

Phase 2: Reason. The language model processes the observed context and decides: "What should I do next? Call a tool, call multiple tools, or return a final answer?" The model's output encodes this decision.

Phase 3: Decide. The agent parses the model's output to extract tool calls (if any). If the model said stop_reason: "end_turn", it's returning a final answer. If stop_reason: "tool_use", it's requesting tool invocations.

Phase 4: Act. The agent executes the tool(s) the model requested, collecting results. Each tool call is independent; an agent might call three tools in parallel, or sequentially if later calls depend on earlier results.

Phase 5: Update state. The agent appends the tool results to the conversation history and loops back to Observe (Phase 1) with expanded context.

Here's a visual representation:

┌─────────────────────────────────────┐
│ Start: User Request │
└────────────────┬────────────────────┘

┌───────▼────────┐
│ Phase 1: Observe
│ (Build context)│
└───────┬────────┘

┌───────▼────────┐
│ Phase 2: Reason│
│ (Call model) │
└───────┬────────┘

┌───────▼────────────────┐
│ Phase 3: Decide │
│ (Parse model output) │
└─────┬──────────┬───────┘
│ │
tool_use end_turn
(yes) (no)
│ │
┌─────▼─────┐ │
│ Phase 4: │ │
│ Act │ │
│(Execute │ │
│ tools) │ │
└─────┬─────┘ │
│ │
┌─────▼─────┐ │
│ Phase 5: │ │
│ Update │ │
│ state │ │
└─────┬─────┘ │
│ │
└──┬──────┘

┌────────▼──────────┐
│ Loop or terminate?│
│ (Check step/token │
│ limits) │
└────────┬──────────┘

┌────────▼──────────┐
│ Return final answer│
└───────────────────┘

Stop Conditions and Termination

Agent loops must terminate to avoid infinite runs and excessive costs. There are several natural stop conditions:

Model-driven: The model outputs stop_reason: "end_turn", signaling it has answered the user's question or determined that tool use won't help further. This is the ideal termination—the model decided naturally that the task was complete.

Step limit: You set a maximum number of iterations (e.g., max_iterations=10). After 10 loops, the agent stops regardless of whether a final answer has been reached. This prevents runaway loops caused by bugs or poor prompting.

Token budget: Language model API calls are billed by tokens. You can set a max_tokens limit per call or a total token budget. When the budget is exhausted, the loop stops.

Time limit: A timeout (e.g., 30 seconds) ensures the agent doesn't run indefinitely if it enters a tight loop or waits for slow tool responses.

Error limit: If too many tool calls fail (errors, invalid parameters), the agent stops and reports the issue rather than retrying indefinitely.

Here's how to implement these stops in Python:

import time

def run_agent_with_stops(
user_message: str,
max_iterations: int = 10,
max_tokens_per_call: int = 2048,
timeout_seconds: float = 60.0,
max_failures: int = 3
) -> str:
client = anthropic.Anthropic()
messages = [{"role": "user", "content": user_message}]

start_time = time.time()
failures = 0

for iteration in range(max_iterations):
# Check time limit
elapsed = time.time() - start_time
if elapsed > timeout_seconds:
return f"Timeout after {elapsed:.1f}s"

# Check failure limit
if failures > max_failures:
return f"Too many failures ({failures})"

try:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=max_tokens_per_call,
tools=tools,
messages=messages
)
except Exception as e:
failures += 1
return f"API error: {e}"

# Check if model finished
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, 'text'):
return block.text
return "Finished but no text returned"

# Handle tool calls
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []

for block in response.content:
if block.type == "tool_use":
try:
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
except Exception as e:
failures += 1
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"Tool error: {e}",
"is_error": True
})

messages.append({"role": "user", "content": tool_results})
else:
return f"Unexpected stop reason: {response.stop_reason}"

return f"Reached max iterations ({max_iterations})"

This implementation respects all four stop conditions. Note the is_error flag in tool results—many LLM APIs support this to help the model understand that a tool call failed and retry with different parameters.

Planning and Reflection in Agent Loops

Sophisticated agents don't blindly call tools; they plan first. A planning phase before tool execution increases success rates and reduces wasted tool calls. Here's a pattern called "reason-first planning":

def run_planning_agent(user_message: str) -> str:
"""Agent that plans before acting."""
client = anthropic.Anthropic()
messages = [{"role": "user", "content": user_message}]

# Phase 1: Plan (model thinks about strategy)
planning_prompt = (
f"User request: {user_message}\n\n"
"Before acting, analyze: What is the goal? What tools do you need? "
"What is the sequence of steps? Think step-by-step."
)
messages[0]["content"] = planning_prompt

plan_response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=messages
)

plan_text = next(
(block.text for block in plan_response.content if hasattr(block, 'text')),
""
)
print(f"Plan: {plan_text}")

# Phase 2: Execute (model calls tools following the plan)
messages[0]["content"] = user_message # Reset to original request
messages.append({"role": "assistant", "content": plan_text})
messages.append({
"role": "user",
"content": "Now execute your plan using tools. Call the tools you described above."
})

# Run the standard agent loop with the plan in context
return run_agent_loop(messages, tools)

Planning trades a few extra tokens upfront for fewer wasted tool calls later. This is especially valuable when tool calls are expensive (external API charges, slow services).

State Management: Maintaining Agent Context

The agent's state—its understanding of progress toward the goal—lives in the message history. Each tool result adds information. A well-designed agent leverages this history effectively.

One pattern is to include a "state summary" in the context:

def run_agent_with_state_summary(user_message: str) -> str:
"""Agent that periodically summarizes its state."""
client = anthropic.Anthropic()
messages = [{"role": "user", "content": user_message}]

for iteration in range(max_iterations):
# Every 3 iterations, summarize progress
if iteration > 0 and iteration % 3 == 0:
summary_prompt = (
"Summarize your progress so far: What have you learned? "
"What is left to do? What is your next step?"
)
messages.append({"role": "user", "content": summary_prompt})

response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=512,
messages=messages
)
# Extract and append summary
messages.append({"role": "assistant", "content": response.content})

# Continue with normal loop...
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2048,
tools=tools,
messages=messages
)
# ... handle response and loop

State summaries prevent the agent from "losing its way" in long conversations; the model refocuses on what matters.

Key Takeaways

  • An agent loop cycles through observe, reason, decide, act, and update phases repeatedly
  • Stop conditions (model finish signal, step/token/time limits, error thresholds) prevent infinite loops and runaway costs
  • Planning before acting improves success and reduces wasted tool calls
  • State management via conversation history and periodic summaries keeps complex agents focused on the goal
  • Robust agents handle tool failures gracefully (catching errors, retrying with different parameters)

Frequently Asked Questions

How many iterations is typical?

Most agents finish in 2-5 iterations. Anything over 10 suggests the agent is struggling (poor tool definitions, ambiguous goal, or a bug). Set max_iterations=5 as a default and increase only if needed.

Should I parallelize tool calls?

If multiple tools don't depend on each other, yes—call them all in one iteration. Most LLM APIs support this. Sequential execution is simpler but slower.

How do I debug a stuck agent?

Log each iteration: what did the model decide? what were tool results? Add is_error: true to tool results that fail, so the model knows to retry. Print the conversation history to see if the model is repeating itself.

Can agents plan without extra API calls?

Yes, if you phrase the initial request clearly: "List 3 steps you'll take, then take them." Some models will reason more carefully with explicit prompting instead of requiring a separate planning phase.

Further Reading