Coordinator & Hierarchical Patterns
The sequential and parallel patterns are predictable because the developer controls the topology — you decide, at design time, which agents run and in what order. That works when the task structure is stable.
But many real tasks do not have a stable structure. A coding task might touch one file or twenty. A research task might need three sources or fifteen. A customer request might require one specialist or four. You cannot hardcode a pipeline when you do not know the shape of the work until you see it.
The coordinator pattern solves this by putting an AI model in charge of the routing. A coordinator agent receives the task, decides how to break it down, dispatches sub-tasks to specialist worker agents, collects their results, and synthesizes a final answer. The topology is not drawn at design time — it emerges at runtime from the coordinator's reasoning about the specific input.
This is the orchestrator-workers pattern: one brain directing many hands. It is more flexible than a fixed pipeline but harder to predict, test, and debug. Everything that was deterministic in the sequential pattern — which agents run, in what order, how many times — becomes a decision the coordinator makes, and decisions made by language models are probabilistic.
The Coordinator Pattern #
A coordinator is an agent whose primary job is not to do the work itself but to figure out what work needs doing and who should do it. It receives a high-level goal, decomposes it into sub-tasks, selects which workers to invoke for each sub-task, monitors the results, and decides when the overall task is complete.
┌─────────────────────────────────────┐
│ User Task │
└──────────────────┬──────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Coordinator Agent │
│ │
│ 1. Analyze task │
│ 2. Decompose into sub-tasks │
│ 3. Select workers │
│ 4. Dispatch and collect results │
│ 5. Synthesize final answer │
│ │
└───────┬──────────┬──────────┬───────┘
│ │ │
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│Worker A│ │Worker B│ │Worker C│
│(code) │ │(search)│ │(write) │
└────────┘ └────────┘ └────────┘
The critical distinction from routing (covered alongside sequential and parallel patterns) is that routing picks one branch. A coordinator might pick one worker, or three, or five — and it might call the same worker multiple times with different inputs. The topology is fully dynamic.
How It Works #
The coordinator is an agent with a special set of tools: the workers themselves. Each worker is exposed to the coordinator as a callable tool with a description of what it can do. The coordinator's ReAct loop reasons about the task, decides which workers to call, calls them via tool invocations, reads the results, and decides what to do next — call more workers, ask a worker to redo something, or synthesize a final answer.
class WorkerTool:
"""Wraps a worker agent as a tool the coordinator can invoke."""
def __init__(self, agent, description):
self.agent = agent
self.name = agent.name
self.description = description
def schema(self):
return {
"name": self.name,
"description": self.description,
"parameters": {
"type": "object",
"properties": {
"task": {
"type": "string",
"description": "The sub-task to assign to this worker.",
}
},
"required": ["task"],
},
}
def execute(self, task):
return self.agent.run(task)
class Coordinator:
def __init__(self, system_prompt, workers, model):
self.system_prompt = system_prompt
self.model = model
self.worker_tools = [
WorkerTool(w, w.description) for w in workers
]
def run(self, task):
tools = [wt.schema() for wt in self.worker_tools]
messages = [{"role": "user", "content": task}]
while True:
response = call_model(
model=self.model,
system=self.system_prompt,
messages=messages,
tools=tools,
)
if not response.has_tool_calls:
return response.text
# Execute each worker the coordinator requested
results = []
for call in response.tool_calls:
worker_tool = self._find_worker(call.name)
result = worker_tool.execute(call.arguments["task"])
results.append({"tool_call_id": call.id, "output": result})
messages.append(response.to_message())
messages.append(tool_results_message(results))
def _find_worker(self, name):
for wt in self.worker_tools:
if wt.name == name:
return wt
raise ValueError(f"Unknown worker: {name}")
The coordinator is itself running a ReAct loop — the same reason-act-observe cycle we covered earlier. The only difference is that its "tools" are not APIs or databases but other agents. Each tool call dispatches a sub-task to a worker, and the tool result is that worker's full response. The coordinator sees these results, reasons about whether the task is complete, and either calls more workers or produces a final answer.
The Coordinator's System Prompt #
The coordinator's system prompt is its job description. It determines how the coordinator decomposes tasks, selects workers, and decides when to stop. A poorly written coordinator prompt leads to either under-delegation (the coordinator tries to do the work itself) or over-delegation (it splits simple tasks into unnecessary sub-tasks).
COORDINATOR_PROMPT = """You are a project coordinator. Your job is to accomplish
the user's goal by delegating work to your specialist workers.
Available workers:
- code_agent: Writes and modifies code. Use for implementation tasks.
- search_agent: Searches documentation and the web. Use for research and
fact-finding.
- review_agent: Reviews code or text for quality and correctness. Use for
validation.
- write_agent: Produces written content (docs, reports, summaries). Use for
any writing task that is not code.
Rules:
1. Break the task into the smallest meaningful sub-tasks.
2. Assign each sub-task to the most appropriate worker.
3. You may call multiple workers in sequence or in parallel.
4. After receiving results, check if the overall goal is met.
- If yes, synthesize the results into a final answer.
- If not, identify what is missing and delegate more work.
5. Do NOT do the work yourself. Your job is to coordinate, not to execute.
6. If a worker's output is insufficient, you may re-assign the same task
with additional guidance, or assign it to a different worker.
"""
Rule 5 is critical. Without it, large models tend to bypass the workers and answer directly — especially for tasks they could handle alone. The coordinator must be constrained to coordinate, not to perform the work itself. Otherwise it degenerates into a single agent with extra steps.
A Concrete Example: Multi-File Code Change #
Coding agents are the canonical use case for the coordinator pattern. When a user asks "add authentication to the API," the coordinator does not know in advance which files need to change, how many there are, or what the dependencies between changes look like. It has to figure that out at runtime.
code_worker = Agent(
name="code_agent",
description="Writes, modifies, and debugs code. Can read files, write "
"files, and run tests.",
system_prompt="""You are a software engineer. Given a specific coding task,
implement it. You have access to the file system and a test runner. Write
clean, tested code. If tests fail, fix the code until they pass.""",
tools=[read_file, write_file, run_tests, list_directory],
model="large-model",
)
search_worker = Agent(
name="search_agent",
description="Searches documentation, codebases, and the web for "
"information relevant to a question.",
system_prompt="""You are a research assistant. Given a question, find the
answer using your search tools. Return concise, factual answers with
sources.""",
tools=[web_search, code_search, doc_search],
model="medium-model",
)
review_worker = Agent(
name="review_agent",
description="Reviews code for correctness, security, and style. Returns "
"specific issues and suggestions.",
system_prompt="""You are a code reviewer. Given code or a diff, review it
for bugs, security issues, performance problems, and style violations.
Return a list of specific, actionable findings.""",
tools=[read_file, static_analysis],
model="large-model",
)
coordinator = Coordinator(
system_prompt=COORDINATOR_PROMPT,
workers=[code_worker, search_worker, review_worker],
model="large-model",
)
# The coordinator figures out the plan at runtime
result = coordinator.run(
"Add JWT-based authentication to the /api/users endpoint. "
"Tokens should expire after 1 hour. Include tests."
)
When the coordinator receives this task, its reasoning might go:
- First, I need to understand the current code. Call
search_agentto find the existing/api/usersendpoint and understand the project structure. - Now I know the layout. Call
code_agentto implement the JWT middleware. - Then call
code_agentto modify the users endpoint to use the middleware. - Then call
code_agentto add tests. - Finally, call
review_agentto review all the changes.
But this plan is not hardcoded. If the search reveals that the project already has a partially implemented auth system, the coordinator adapts. If the code agent's tests fail, the coordinator might re-dispatch. If the review agent flags a security issue, the coordinator sends the code agent back to fix it. The plan evolves as new information arrives — exactly like a human tech lead adapting to what they learn during implementation.
Parallel Dispatch #
A coordinator does not have to call workers one at a time. When the task decomposes into independent sub-tasks, the coordinator can dispatch multiple workers simultaneously — the same parallel pattern we already discussed, but initiated dynamically by the coordinator rather than statically by the developer.
import asyncio
class AsyncCoordinator(Coordinator):
"""Coordinator that can dispatch workers in parallel."""
async def run(self, task):
tools = [wt.schema() for wt in self.worker_tools]
messages = [{"role": "user", "content": task}]
while True:
response = call_model(
model=self.model,
system=self.system_prompt,
messages=messages,
tools=tools,
)
if not response.has_tool_calls:
return response.text
# Run all requested workers in parallel
async def execute_worker(call):
worker_tool = self._find_worker(call.name)
result = await asyncio.to_thread(
worker_tool.execute, call.arguments["task"]
)
return {"tool_call_id": call.id, "output": result}
results = await asyncio.gather(
*[execute_worker(call) for call in response.tool_calls]
)
messages.append(response.to_message())
messages.append(tool_results_message(results))
When the coordinator makes multiple tool calls in a single response — say, dispatching the code agent and the search agent simultaneously — the async coordinator runs them in parallel. This is a natural optimization: the model already decided these tasks are independent (otherwise it would have sequenced them), so running them concurrently reduces wall-clock time without changing the semantics.
Most model APIs support this directly: when a model response contains multiple tool calls, it is signaling that those calls are independent and can be executed concurrently.
The Hierarchical Pattern #
The coordinator pattern has one agent at the top directing workers at the bottom — a flat structure. The hierarchical pattern adds levels: a top-level coordinator delegates to mid-level coordinators, which delegate to workers. It is management all the way down.
┌──────────────────┐
│ Lead Coordinator│
│ (project mgr) │
└───┬─────────┬────┘
│ │
┌────────────┘ └────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Sub-Coordinator │ │ Sub-Coordinator │
│ (backend lead) │ │ (frontend lead) │
└──┬─────────┬────┘ └──┬─────────┬────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│API │ │DB │ │UI │ │Test │
│Worker │ │Worker │ │Worker │ │Worker │
└────────┘ └────────┘ └────────┘ └────────┘
Why add levels? The same reason organizations have middle management: span of control. A coordinator with fifteen workers is in the same position as a single agent with fifteen tools — the selection problem becomes unreliable. The coordinator has to understand what each worker does, remember what it already delegated, and keep track of all the outstanding results. Past a certain point (typically five to eight workers), coordination quality degrades.
Hierarchy solves this by giving each coordinator a manageable number of direct reports. The lead coordinator thinks in terms of high-level domains ("backend work" and "frontend work"), not individual implementation tasks. Each sub-coordinator thinks in terms of specific tasks within its domain ("API changes" and "database migrations"), not the overall project. This mirrors how humans handle complex projects: a VP assign projects to team leads, who assign tasks to engineers.
Implementation #
The hierarchical pattern is just the coordinator pattern applied recursively. A sub-coordinator is itself a coordinator — it has its own workers, its own system prompt, and its own ReAct loop. The parent coordinator sees it as a single tool (just like any worker), unaware that it internally manages multiple agents.
# Level 2: Workers
api_worker = Agent(
name="api_worker",
description="Implements and modifies API endpoints.",
system_prompt="You are a backend API developer...",
tools=[read_file, write_file, run_tests],
model="large-model",
)
db_worker = Agent(
name="db_worker",
description="Handles database schema changes and migrations.",
system_prompt="You are a database engineer...",
tools=[read_file, write_file, run_migration, query_db],
model="large-model",
)
ui_worker = Agent(
name="ui_worker",
description="Implements frontend UI components and pages.",
system_prompt="You are a frontend developer...",
tools=[read_file, write_file, run_build],
model="large-model",
)
test_worker = Agent(
name="test_worker",
description="Writes and runs integration and end-to-end tests.",
system_prompt="You are a QA engineer...",
tools=[read_file, write_file, run_tests, run_e2e],
model="medium-model",
)
# Level 1: Sub-coordinators
backend_coordinator = Coordinator(
system_prompt="""You are the backend team lead. Break backend tasks into
API work and database work. Coordinate between the api_worker and
db_worker to ensure changes are consistent. Verify that API changes
work with the database schema.""",
workers=[api_worker, db_worker],
model="large-model",
)
# Wrap it as a worker the lead coordinator can call
backend_coordinator.name = "backend_team"
backend_coordinator.description = (
"Handles all backend work: API endpoints, database changes, "
"and server-side logic."
)
frontend_coordinator = Coordinator(
system_prompt="""You are the frontend team lead. Break frontend tasks into
UI implementation and testing. Ensure components match the API contracts
provided. Run tests after implementation.""",
workers=[ui_worker, test_worker],
model="large-model",
)
frontend_coordinator.name = "frontend_team"
frontend_coordinator.description = (
"Handles all frontend work: UI components, pages, and "
"frontend integration tests."
)
# Level 0: Lead coordinator
lead = Coordinator(
system_prompt="""You are the project lead. You receive feature requests and
coordinate between the backend team and frontend team. Break the feature
into backend and frontend work. The backend team should complete its work
first so the frontend team can build against the real API. Verify that
the overall feature works end-to-end.""",
workers=[backend_coordinator, frontend_coordinator],
model="large-model",
)
# One call sets the whole tree in motion
result = lead.run(
"Add a user profile page that shows the user's name, email, and "
"order history. Include an edit button for name and email."
)
The lead coordinator sees two workers: "backend_team" and "frontend_team." It does not know — or need to know — that each of these is itself a coordinator managing multiple agents. This is encapsulation: each level hides its internal complexity from the level above.
When Hierarchy Helps #
Hierarchy earns its complexity in specific situations:
Large projects with natural domain boundaries. A feature that spans backend, frontend, mobile, and infrastructure naturally maps to four sub-coordinators, each managing two to four workers. Trying to coordinate eight or more workers from a single coordinator produces unreliable delegation.
Tasks requiring sequential dependencies between domains. The lead coordinator can enforce "backend first, then frontend" ordering that sub-coordinators cannot see — they only know their own domain. Cross-domain sequencing is the lead coordinator's job.
Different levels of abstraction. The lead coordinator thinks about features. Sub-coordinators think about components. Workers think about files. Each level reasons at the appropriate abstraction, which keeps prompts focused and decisions manageable.
Reusable sub-teams. Once you build a backend sub-coordinator with its workers, you can reuse that entire sub-tree for any task that involves backend work. The sub-coordinator knows how to manage API and database changes regardless of what the higher-level task is.
When Hierarchy Hurts #
Shallow tasks. If the task only needs two or three workers, a flat coordinator is simpler. Adding a middle layer for no reason just adds latency and cost without improving coordination quality.
Unclear domain boundaries. If most tasks require heavy collaboration between domains — backend and frontend changing together in tight iteration loops — hierarchy creates artificial boundaries. The backend sub-coordinator finishes, passes results up to the lead, the lead passes them down to the frontend sub-coordinator, and if the frontend needs a backend change, the request has to travel all the way back up and down again. Message-passing overhead dominates.
Cost sensitivity. Each level of coordination is at least one additional large-model call per sub-task. A two-level hierarchy on a task with four sub-tasks means: the lead coordinator reasons about the decomposition (1 call), dispatches to two sub-coordinators (2 calls each reason about their sub-decomposition), and each sub-coordinator dispatches to workers (another 4+ calls). That is a minimum of 7 model calls before any real work happens. For simple tasks, this overhead dwarfs the value.
The Coordinator's Decision Quality #
The entire system depends on the coordinator making good decisions. If it decomposes the task poorly, assigns sub-tasks to the wrong workers, or declares completion too early, the output is bad — and the failure mode is harder to diagnose than in a static pipeline because the topology is not predictable.
Failure Modes #
Under-decomposition. The coordinator assigns a large, complex task to a single worker instead of breaking it down. The worker gets overwhelmed, produces low-quality output, or gets stuck in long ReAct loops. This happens when the coordinator's system prompt does not emphasize decomposition or when the task looks superficially simple but has hidden complexity.
Over-decomposition. The coordinator splits a trivial task into many tiny sub-tasks, each assigned to a different worker. A simple "fix the typo in the README" becomes three worker calls (search for the file, make the edit, verify the edit). This wastes tokens and adds latency. A good coordinator prompt includes guidance on when not to split.
Wrong worker selection. The coordinator assigns a task to a worker that lacks the right tools or expertise. The search agent gets asked to write code; the code agent gets asked to research. This happens when worker descriptions are ambiguous or when the coordinator does not fully understand the workers' capabilities.
Premature completion. The coordinator sees partial results and declares the task done before all sub-tasks are addressed. This is common when the coordinator's synthesis reasoning is weak — it reads one good result and stops checking the others.
Infinite delegation loops. The coordinator is never satisfied with a worker's output and keeps re-dispatching. "The code needs more tests." The code agent adds tests. "The tests are not comprehensive enough." The code agent adds more. This continues until a token limit is hit. Iteration caps are the defense.
Mitigations #
Each failure mode has a corresponding defense:
COORDINATOR_PROMPT_WITH_GUARDS = """You are a project coordinator.
Decomposition rules:
- If the task can be completed in a single focused effort by one worker,
do NOT split it further. Send it directly.
- If the task involves multiple files, domains, or skills, split it into
one sub-task per natural unit of work.
- Maximum 5 sub-tasks per decomposition. If you need more, you are
splitting too finely.
Worker selection:
- code_agent: ONLY for writing, modifying, or debugging code.
- search_agent: ONLY for finding information. Cannot modify anything.
- review_agent: ONLY for evaluating existing code or text. Cannot modify.
- write_agent: ONLY for producing written content that is not code.
Completion criteria:
- Before declaring completion, verify that EVERY sub-task has received
a satisfactory result.
- List each sub-task and its status in your final synthesis.
Retry limits:
- You may retry a failed worker at most 2 times.
- After 2 retries, report the failure and move on.
"""
The prompt encodes explicit constraints: maximum sub-tasks, clear worker boundaries, completion checklists, and retry caps. These are guardrails that reduce the coordinator's decision space — trading flexibility for reliability.
Additionally, programmatic safeguards complement the prompt:
class GuardedCoordinator(Coordinator):
"""Coordinator with programmatic safety limits."""
def __init__(self, *args, max_iterations=10, max_worker_calls=20, **kwargs):
super().__init__(*args, **kwargs)
self.max_iterations = max_iterations
self.max_worker_calls = max_worker_calls
def run(self, task):
tools = [wt.schema() for wt in self.worker_tools]
messages = [{"role": "user", "content": task}]
iterations = 0
total_worker_calls = 0
while True:
iterations += 1
if iterations > self.max_iterations:
return self._force_synthesis(messages)
response = call_model(
model=self.model,
system=self.system_prompt,
messages=messages,
tools=tools,
)
if not response.has_tool_calls:
return response.text
total_worker_calls += len(response.tool_calls)
if total_worker_calls > self.max_worker_calls:
return self._force_synthesis(messages)
results = []
for call in response.tool_calls:
worker_tool = self._find_worker(call.name)
result = worker_tool.execute(call.arguments["task"])
results.append({"tool_call_id": call.id, "output": result})
messages.append(response.to_message())
messages.append(tool_results_message(results))
def _force_synthesis(self, messages):
"""Force the coordinator to produce a final answer."""
messages.append({
"role": "user",
"content": "You have reached the maximum number of iterations. "
"Synthesize the best answer from the results you have "
"so far. Do not call any more workers.",
})
response = call_model(
model=self.model,
system=self.system_prompt,
messages=messages,
tools=[], # No tools available — must produce text
)
return response.text
The _force_synthesis method is the safety valve. When the coordinator exceeds its iteration or call budget, it is forced to produce an answer from whatever results it has gathered so far. This prevents runaway costs and unbounded latency. The quality of the forced synthesis depends on how much useful work the workers completed before the budget ran out — which is why the budget should be generous enough for typical tasks but tight enough to catch pathological loops.
Coordinator vs Static Topology #
The choice between a coordinator and a static pipeline is the central design decision in multi-agent systems. Here is how to think about it:
| Dimension | Static (Sequential/Parallel) | Dynamic (Coordinator) |
|---|---|---|
| Topology | Fixed at design time | Emerges at runtime |
| Predictability | High — same agents, same order | Low — varies per input |
| Testability | Unit test each agent independently | Must test the coordinator's decisions |
| Debugging | Trace is deterministic | Trace varies per run |
| Flexibility | New task shapes need new pipelines | Same coordinator handles varied tasks |
| Cost predictability | Constant per run | Varies with task complexity |
| Latency predictability | Bounded and measurable | Varies with decomposition depth |
The pragmatic guideline: use static topologies when you know the structure of your tasks at design time, and use a coordinator when you do not. If 90% of your tasks follow the same three-step pattern, build a pipeline. If every task is different — different number of files, different domains involved, different depth of analysis needed — use a coordinator.
Production systems use a hybrid: a coordinator that operates within a larger static pipeline. For example, a fixed pipeline might be: (1) classify the request, (2) coordinator handles the work (dynamic), (3) validate the output. The outer structure is static and predictable; the middle section is dynamic and flexible. This gives you bounded behavior at the edges (classification and validation are always the same) with adaptive behavior in the core processing step.
Context Management #
The coordinator pattern creates a particular challenge for context management. The coordinator's context window must hold:
- Its own system prompt and worker descriptions
- The full conversation history (every worker dispatch and result)
- Enough working memory to track the overall plan
As the task progresses, worker results accumulate in the coordinator's context. A five-worker task where each worker produces a page of output adds five pages to the coordinator's conversation. After dispatching and collecting several rounds of work, the context fills up — and the coordinator starts losing track of earlier results or forgetting parts of the plan.
Summarization #
The primary defense is summarizing worker results before they enter the coordinator's context:
class SummarizingCoordinator(Coordinator):
"""Coordinator that summarizes worker outputs to save context."""
def __init__(self, *args, summarizer_model="small-model", **kwargs):
super().__init__(*args, **kwargs)
self.summarizer_model = summarizer_model
def _summarize_result(self, worker_name, full_output):
"""Compress a worker's output to its essential findings."""
summary = call_model(
model=self.summarizer_model,
system="Summarize the following worker output in 2-3 sentences. "
"Preserve key findings, numbers, and decisions. "
"Discard examples and verbose explanations.",
messages=[{"role": "user", "content": full_output}],
)
return f"[{worker_name} summary]: {summary.text}"
The trade-off is information loss. The coordinator sees a compressed version of the worker's output, which is usually fine for deciding what to do next but can miss important details. A common middle ground: store full outputs in an external scratchpad and pass summaries into the coordinator's context, with the ability to retrieve the full output if the coordinator specifically asks for it.
Scratchpad Pattern #
The scratchpad is an external data store (a dictionary, a database, a file) where full worker outputs are stored, indexed by sub-task. The coordinator's context contains only summaries and references, but it has a tool to retrieve full details when needed.
class ScratchpadCoordinator(Coordinator):
"""Coordinator with external memory for worker results."""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.scratchpad = {}
def run(self, task):
# Add a "retrieve" tool alongside worker tools
retrieve_tool = {
"name": "retrieve_result",
"description": "Retrieve the full output from a previously "
"completed worker task. Use when you need details "
"beyond the summary.",
"parameters": {
"type": "object",
"properties": {
"task_id": {
"type": "string",
"description": "The ID of the task to retrieve.",
}
},
"required": ["task_id"],
},
}
tools = [wt.schema() for wt in self.worker_tools] + [retrieve_tool]
messages = [{"role": "user", "content": task}]
while True:
response = call_model(
model=self.model,
system=self.system_prompt,
messages=messages,
tools=tools,
)
if not response.has_tool_calls:
return response.text
results = []
for call in response.tool_calls:
if call.name == "retrieve_result":
full = self.scratchpad.get(call.arguments["task_id"], "Not found")
results.append({"tool_call_id": call.id, "output": full})
else:
worker_tool = self._find_worker(call.name)
full_output = worker_tool.execute(call.arguments["task"])
task_id = f"{call.name}_{len(self.scratchpad)}"
self.scratchpad[task_id] = full_output
summary = self._summarize_result(call.name, full_output)
results.append({
"tool_call_id": call.id,
"output": f"{summary}\n[Full result stored as: {task_id}]",
})
messages.append(response.to_message())
messages.append(tool_results_message(results))
The coordinator normally works with summaries — quick, cheap, and context-efficient. When it needs the full detail (to synthesize a final report, to debug a failure, to cross-reference two workers' findings), it retrieves from the scratchpad. This keeps the coordinator's context lean without losing access to the raw information.
Trade-Offs #
The coordinator pattern is powerful but expensive in multiple dimensions:
Latency is unpredictable. A static pipeline always takes the same number of steps. A coordinator might solve a task in two worker calls or twenty. You cannot make latency guarantees to users because you do not control the decomposition. Setting iteration caps bounds the worst case but at the cost of potentially incomplete results.
Cost scales with task complexity. The coordinator itself uses tokens for reasoning at every step. Each worker call is a full agent invocation (potentially multiple model calls internally). A complex task might consume tens of thousands of tokens across the coordinator and all its workers. For tasks that the coordinator decomposes into many sub-tasks, costs can be an order of magnitude higher than a static pipeline that handles the same work.
Debugging requires trace reconstruction. When the output is wrong, you need to reconstruct the coordinator's decision tree: what did it plan? Which workers did it call? What did they return? Where did the logic break? Unlike a static pipeline where the trace is always the same shape, a coordinator's trace is unique per run. Log everything, and build tooling to visualize the dynamic traces.
The coordinator is a single point of failure. If the coordinator makes a bad decomposition decision early, everything downstream is wrong. Unlike a static pipeline where each stage can be tested and validated independently of routing logic, the coordinator's decisions affect the entire system's behavior. A bad coordinator prompt is worse than a bad worker prompt because it corrupts the overall plan, not just one step.
Model quality matters more for the coordinator. Workers can sometimes get by with medium-quality models because their tasks are narrow and focused. The coordinator needs the best model you have — it is doing the hardest cognitive work: understanding a complex task, reasoning about how to decompose it, tracking progress across multiple sub-tasks, and synthesizing results. Skimping on the coordinator's model quality to save cost usually backfires.
Practical Guidelines #
Based on the trade-offs, here are guidelines for building coordinator-based systems:
Start flat. Begin with a single-level coordinator and a small set of workers (three to five). Only add hierarchy if the coordinator's decision quality degrades as you add more workers.
Make worker descriptions precise. The coordinator picks workers based on their descriptions. Vague descriptions ("handles various tasks") lead to wrong selections. Each description should clearly state what the worker can do, what tools it has, and — critically — what it cannot do.
Log everything. Every coordinator decision, every worker dispatch, every result. You will need this for debugging. Include the coordinator's reasoning (its text before tool calls) in the logs — this is where you see why it made a particular decision.
Set budgets. Maximum iterations, maximum total worker calls, maximum tokens. Without budgets, pathological inputs can cause unbounded cost and latency.
Test the coordinator's decomposition separately from worker quality. Give the coordinator mock workers that return canned responses and verify that it decomposes tasks correctly, selects the right workers, and synthesizes results properly. This isolates coordinator logic from worker behavior.
Monitor the coordinator-to-worker call ratio. If the coordinator is making many more calls than workers (lots of reasoning, few dispatches), the task might be too simple for a coordinator. If workers are being called repeatedly on the same sub-task, the coordinator might be stuck in a retry loop.
Conclusion #
The coordinator and hierarchical patterns bring flexibility to multi-agent systems by putting an AI model in charge of task decomposition and routing. Instead of hardcoding which agents run in what order, a coordinator reasons about the task, selects workers, and adapts its plan based on results.
Key takeaways:
- A coordinator is an agent whose tools are other agents — it delegates work rather than performing it
- Workers are exposed to the coordinator as callable tools with clear descriptions of their capabilities
- The hierarchical pattern applies coordination recursively: sub-coordinators manage focused teams, while the lead coordinator manages at a higher abstraction level
- Dynamic coordination trades predictability for flexibility — use it when the task structure is not known at design time
- The coordinator is the single point of failure; invest in its system prompt, model quality, and programmatic guardrails (iteration caps, retry limits, forced synthesis)
- Context management is critical — use summarization and scratchpads to keep the coordinator's context lean while preserving access to full details
- Start with a flat coordinator and only add hierarchy when span of control degrades decision quality
- Always prefer a static pipeline when the task structure is predictable — coordinators earn their complexity only when the shape of the work varies per input