The giveaway is in the execution log. Open a runaway execution in the n8n editor, expand the AI Agent node, and scroll through the intermediate steps. In a healthy run you'll see something like:
Thought: I need the customer's order status.
Action: lookup_order
Action Input: { "order_id": "A-1042" }
Observation: { "status": "shipped", "carrier": "DHL" }
Thought: I have what I need.
Final Answer: Your order shipped via DHL on...
In a broken run, the Action and Observation pair just repeats. Same tool, same input, same output, sometimes 30 or 40 times before the node finally errors with a maxIterations exception. The model genuinely believes it hasn't done the work yet, because the tool description it was given overlaps with another tool, and the planner cannot tell which one was supposed to produce the answer.
On my client's workflow, the overlap was between a tool called get_customer and another called lookup_account. Both had descriptions that mentioned "fetch customer details by email or ID". The agent kept calling one, getting back data, and then deciding it should probably try the other one too in case it was meant to use that. Then it would loop back. The LangChain tool calling docs are explicit about this: the description field is the only signal the planner has for tool selection, and ambiguity there is fatal.
Diagnosis recipe: a 5-minute check you can run right now
Before changing anything, confirm you actually have an overlap problem and not a different failure mode (token-limit truncation, malformed JSON in a tool response, or a memory node that's evicting context mid-run). Here's the sequence I run every time.
Step one: turn on verbose logging for the agent. In n8n you do this by setting the AI Agent node's "Return Intermediate Steps" toggle to true and re-running with a small test input. Then pipe the output through this little Code node:
// Code node: summarise tool call frequency from agent output
const steps = $input.first().json.intermediateSteps || [];
const counts = {};
for (const step of steps) {
const toolName = step.action?.tool ?? 'unknown';
const inputKey = JSON.stringify(step.action?.toolInput ?? {});
const key = `${toolName}::${inputKey}`;
counts[key] = (counts[key] || 0) + 1;
}
const repeats = Object.entries(counts)
.filter(([, n]) => n > 1)
.sort((a, b) => b[1] - a[1]);
return [{ json: { totalSteps: steps.length, repeats } }];
If the repeats array has any entry with the same tool and same input fired more than twice, you have a loop. If the same input is being sent to two different tools alternately, you have the overlap variant I'm describing here.
Step two: list every tool the agent has access to and dump their descriptions side by side. I literally paste them into a markdown table in my notes app. You're looking for any pair where a naive reader couldn't tell which tool to pick from a one-sentence task. If you can't tell, GPT-4o-mini definitely can't, and even Claude Sonnet 4.6 will hesitate.
Step three: check the system prompt. If the prompt says something like "use the available tools to answer", that's not enough. The agent needs explicit routing rules. I'll cover the prompt fix in the prevention section below.
One thing to rule out early: if you're using the Memory sub-node and the buffer window is too small, the agent can genuinely forget that it already called a tool. That presents identically to an overlap loop. Bump the window to at least 20 messages and re-test before blaming tool descriptions. The n8n memory buffer documentation covers the window sizing tradeoffs.
Prevention pattern 1: disjoint tool descriptions with explicit "use when" clauses
The single highest-impact change. Rewrite every tool description to start with a "Use when..." clause that is mutually exclusive with every other tool. Here's the before-and-after from my customer's workflow:
// BEFORE - ambiguous
{
name: "get_customer",
description: "Fetch customer details by email or ID"
}
{
name: "lookup_account",
description: "Look up an account record using email or account ID"
}
// AFTER - disjoint
{
name: "get_customer",
description: "Use when you have an email address and need profile data (name, signup date, plan tier). Do NOT use for billing or order history."
}
{
name: "lookup_account",
description: "Use when you have an internal account ID (starts with ACC-) and need billing status. Do NOT use for profile data or emails."
}
Notice three things. First, each description names a specific input shape (email vs ACC- prefix). Second, each description names specific output fields the tool returns. Third, each description has an explicit negative clause telling the agent what NOT to use it for. That negative clause is the part most people skip, and it's what kills the loop dead.
After this change alone, my client's runaway workflow went from averaging 8.4 tool calls per execution to 1.9. The agent picks the right tool the first time because the description tells it the boundary.
Prevention pattern 2: hard cap maxIterations and add a circuit breaker
Even with clean descriptions, you want a safety net. The AI Agent node has a maxIterations setting that defaults to 10 in n8n 1.74. I set this to 5 for any agent that should only need one or two tool calls, and I add an Error Trigger workflow that fires a Slack alert if any agent run ever actually hits the cap. A run that hits maxIterations is always a bug, never a feature.
For belt-and-braces, I also wrap the agent in a meta-workflow that tracks total cost per upstream trigger. Here's the Python equivalent I use when prototyping the same logic outside n8n, which makes the circuit-breaker idea more explicit:
from dataclasses import dataclass, field
@dataclass
class AgentBudget:
max_tool_calls: int = 5
max_tokens: int = 20_000
tool_calls: int = 0
tokens_used: int = 0
seen: set = field(default_factory=set)
def check(self, tool_name: str, tool_input: dict) -> None:
self.tool_calls += 1
if self.tool_calls > self.max_tool_calls:
raise RuntimeError(f"Tool call cap exceeded ({self.max_tool_calls})")
sig = (tool_name, repr(sorted(tool_input.items())))
if sig in self.seen:
raise RuntimeError(f"Repeated tool call detected: {sig}")
self.seen.add(sig)
The same shape works as an n8n Code node placed inside a sub-workflow tool. Track the budget in workflow static data, throw early, and let the parent workflow's error branch handle it. You'll never get billed for a four-hour runaway again.
Prevention pattern 3: route before you reason
For any agent with more than three tools, I no longer let the model pick the tool at all on the first hop. I put a deterministic Switch node in front of the AI Agent that classifies the inbound request and selects which subset of tools to expose. The agent only ever sees the two or three tools it could possibly need for that request class.
This works because n8n lets you build sub-workflow tools dynamically. You can have one "support-triage" AI Agent node configured with tools A, B, C, and a separate "billing-question" AI Agent node configured with tools D, E, F, then a Switch node upstream that picks which agent to invoke. The classifier can be a cheap one-shot LLM call (gpt-4o-mini works fine) or even a regex if your inputs are structured enough. The OpenAI function calling guide notes that smaller tool surfaces dramatically improve selection accuracy, which matches what I see in practice.
The cost of this pattern is a slightly more complex workflow graph. The benefit is that each agent only ever has to choose between truly disjoint options, and the planner can't even consider tools that would create overlap. I've never seen a routed agent enter an infinite loop, full stop.
Caveats and what I won't recommend
A few things I tried that didn't work, so you don't have to. Putting "do not call the same tool twice" in the system prompt: ignored about 30% of the time, especially under load. Lowering the model temperature: helped marginally, didn't fix the root cause. Switching from gpt-4o to Claude Sonnet 4.6: shifted the failure mode rather than eliminating it (Claude was more likely to stop early instead of looping, but still picked the wrong tool when descriptions overlapped).
Also: don't use the Auto-Fixing Output Parser as a loop guard. It will silently retry the entire agent run on a parse failure, which can compound the loop problem into a loop-of-loops. If you need structured output, define a single return_answer tool that the agent must call to finish, and validate the schema in the next node.
One more thing for anyone running on the self-hosted n8n queue mode. Each agent iteration is one worker job, and if your workers are configured with a long timeout, a single runaway can monopolise a worker for hours. Set a per-workflow execution timeout (Settings → Execution Timeout) of something sane like 120 seconds for agent workflows. Related reading: my notes on n8n queue mode worker tuning covers the timeout interactions in more detail, and controlling OpenAI function-calling costs goes deeper on the budget circuit-breaker pattern.
Closing the loop, literally
The fix that ended my client's 11pm incident was three lines of edited tool descriptions and a maxIterations of 5. Total time from diagnosis to deployed fix was about 25 minutes once I understood what to look for. The pattern is so consistent across the cases I've seen that I now audit tool descriptions as the first step in any agent code review, before I even look at the prompt.
If you're building agent workflows in n8n in 2026, treat tool descriptions as production-critical strings. They are the planner's entire view of what your system can do, and ambiguity there costs real money. Write them like you'd write API documentation for a colleague who has to pick between two endpoints with no other context, because that's exactly what your model is doing on every single iteration.
FAQ
Does this happen with all models or just GPT-4?
I've reproduced it on gpt-4o, gpt-4o-mini, Claude Sonnet 4.6, and Gemini 2.0 Flash. Smaller and cheaper models loop more readily because they're worse at planning, but even frontier models will loop given sufficiently overlapping tool descriptions. It's a prompt-engineering problem, not a model problem.
Should I just set maxIterations to 2 and call it done?
Tempting, but no. A hard cap protects your wallet but masks the underlying bug. Your agent will silently fail to complete legitimate multi-step tasks. Fix the tool descriptions first, then set a cap that's just above your real expected maximum. Mine is usually 5 for support agents and 10 for research agents.
Can the agent itself detect that it's looping and stop?
Sort of. You can add a sentence to the system prompt like "If you find yourself about to call the same tool with the same input you already used, stop and return what you have." This works perhaps 70% of the time in my testing, which is not good enough for production. Use it as defence-in-depth, never as the primary fix.
Does this affect the new n8n native AI nodes added in 1.74?
Yes. The native AI Agent node uses the same LangChain.js planner under the hood as the older Tools Agent. The behaviour is identical. Anything I've written here applies to both, and I expect it to keep applying through at least the n8n 2.0 release later this year unless they ship a different planner architecture.