Planning & Task Decomposition
ReAct is reactive. It decides one step at a time, and that works well for focused tasks where three or four tool calls get you to an answer. But give a ReAct agent a goal like "research these five competitors, compare their pricing models, and draft a summary report" and you will watch it wander. It has no map. It picks the next step based on whatever just happened, with no sense of the overall shape of the work.
Planning fixes this. A planning agent looks at the goal before it starts acting, breaks it into sub-tasks, and then executes those sub-tasks in order. The model still reasons and calls tools, but now it is working from a structure it created upfront rather than improvising every step.
This is the difference between a cook who reads the whole recipe before turning on the stove and one who figures out each step by tasting the pot. Both can produce a meal, but the one with the plan handles complex dishes with less waste and fewer surprises.
ReAct Is Not Enough #
ReAct's one-step-at-a-time approach has three failure modes that show up as tasks get larger.
Drift. Without a plan, the agent can chase an interesting tangent from a tool result and lose sight of the original goal. Five steps in, it is answering a different question than the one it was asked.
Redundancy. The agent may repeat work because it does not remember what it already decided to do. It fetches the same data twice, or solves a sub-problem it already solved three turns ago, because nothing in the loop says "you already handled that."
Poor ordering. Some tasks have dependencies — you need the budget numbers before you can calculate ROI. A reactive agent might try to calculate ROI first, fail, fetch the budget, and then redo the calculation. A plan would have caught the dependency upfront.
Planning does not replace ReAct. It adds a layer on top. The agent still uses a reason-act-observe loop for each sub-task, but now that loop is guided by a higher-level structure.
Plan-and-Execute #
The most common planning pattern splits the work into two distinct phases: plan, then execute.
In the planning phase, the model receives the user's goal and produces a structured list of sub-tasks. Each sub-task has a description and, optionally, a dependency on other sub-tasks. No tools are called during planning — this is pure reasoning.
In the execution phase, the runtime walks through the plan step by step. Each sub-task gets its own ReAct-style loop with access to tools. When a sub-task finishes, its result feeds into the context for the next one.
User Goal
│
▼
┌───────────┐
│ Plan │ (model reasons about the goal,
│ Generate │ produces a list of sub-tasks)
└─────┬─────┘
│
▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Sub-task 1│────▶ │ Sub-task 2│────▶│ Sub-task 3│
│ (ReAct) │ │ (ReAct) │ │ (ReAct) │
└───────────┘ └───────────┘ └───────────┘
│
▼
┌───────────┐
│ Final │
│ Answer │
└───────────┘
Here is what the planning phase might look like in code:
def generate_plan(goal: str, tools: list[dict]) -> list[dict]:
prompt = f"""Break this goal into sub-tasks.
Return a JSON array where each item has:
- "id": a short identifier
- "description": what to do
- "depends_on": list of ids this task needs completed first
Goal: {goal}
Available tools: {[t["name"] for t in tools]}
"""
response = call_model(prompt)
plan = parse_json(response.text)
validate_plan(plan)
return plan
And the execution loop:
def execute_plan(plan: list[dict], tools: list[dict]) -> str:
results = {}
for task in topological_sort(plan):
context = {
"task": task["description"],
"prior_results": {
dep: results[dep] for dep in task.get("depends_on", [])
},
}
results[task["id"]] = react_loop(context, tools, max_steps=8)
return synthesize(results)
A few things to notice. The plan is data — a list of dictionaries, not free-form text. That means you can validate it, check for cycles in the dependency graph, and reject plans that reference tools that do not exist. The execution loop respects dependencies by running tasks in topological order. Each sub-task gets its own ReAct loop with a bounded step limit, so a single difficult sub-task cannot consume the entire budget.
Replanning #
No plan survives contact with reality. A sub-task might fail because a tool is down, or it might return information that changes the rest of the plan. Rigid plan-and-execute handles this poorly — it marches through the original plan even when the plan is now wrong.
Replanning adds a check after each sub-task: given the original goal, the plan so far, and the result of the sub-task just completed, should the plan change?
def maybe_replan(goal: str, plan: list[dict], completed: dict, tools: list[dict]) -> list[dict]:
prompt = f"""Original goal: {goal}
Current plan: {json.dumps(plan)}
Completed so far: {json.dumps(completed)}
Given the results, does the remaining plan still make sense?
If yes, return the remaining plan unchanged.
If no, return a revised plan for the remaining work.
Return JSON only.
"""
response = call_model(prompt)
return parse_json(response.text)
This costs an extra model call per sub-task, which is a real trade-off. You can mitigate it by only replanning when a sub-task fails or returns something unexpected, rather than after every step. The decision of when to replan is itself a design choice — too often and you burn tokens, too rarely and you execute a stale plan.
Task Decomposition Strategies #
Planning is only as good as the decomposition. A plan with one giant sub-task is no better than no plan at all. A plan with twenty micro-tasks overwhelms the context window and adds overhead. The right granularity depends on the task, but a few strategies show up repeatedly.
Flat decomposition splits the goal into a handful of independent sub-tasks at one level. This works when the sub-tasks do not depend on each other much — "gather data from three sources, then combine" is a natural fit. It is simple to implement and easy to parallelize.
Hierarchical decomposition breaks the goal into sub-goals, then breaks those into smaller sub-goals, recursively. This is the approach used in hierarchical task decomposition patterns. A top-level agent decomposes the goal and delegates each piece to a sub-agent, which may further decompose its piece in turn. It handles complex, ambiguous problems well but adds significant depth to the execution, which means more model calls and harder debugging.
Dependency-aware decomposition is what the code examples above use. Each sub-task declares which other sub-tasks it depends on, and the runtime uses that graph to determine execution order. This is the sweet spot for most agent tasks — it captures the real structure of the work without requiring deep hierarchies.
The model is not always good at decomposition. Weaker models produce plans that are too vague ("research the topic") or too granular ("open the browser, type the URL, click search"). You can improve this by including examples of good plans in the prompt, constraining the output format with a schema, and validating that every sub-task maps to at least one available tool.
Trade-offs #
Planning adds a phase before execution, and that phase has costs.
Latency. The planning call adds at least one round trip before any real work starts. For simple tasks, this overhead is not worth it — a ReAct loop would have finished before the plan was generated. Planning pays off when the task is complex enough that the upfront cost saves wasted steps during execution.
Token cost. The plan itself consumes tokens. Replanning consumes more. For a five-step plan with replanning after each step, you are making five extra model calls that a pure ReAct loop would not need.
Brittleness of the plan. The model might produce a bad plan — missing a critical step, ordering things wrong, or including a step that is impossible given the available tools. If you execute a bad plan without checking, you waste the entire budget on the wrong path. Validation and replanning are the mitigations, but they add complexity.
Context window pressure. The plan, plus the results of completed sub-tasks, plus the current sub-task's ReAct trace all compete for space in the context window. For long plans, you need to summarize completed sub-task results rather than passing them through verbatim.
The decision of whether to use planning comes down to task complexity. If the task takes fewer than four or five tool calls, ReAct alone is fine. If the task has multiple phases, dependencies between steps, or requires coordinating information from several sources, planning is worth the overhead.
Conclusion #
Planning gives an agent something ReAct does not: the ability to think about the shape of the work before diving in. It reduces drift, avoids redundant steps, and handles dependencies between sub-tasks.
Key takeaways:
- Plan-and-execute separates thinking about what to do from doing it — the plan is pure reasoning, the execution uses ReAct loops
- Plans should be structured data (not free-form text) so you can validate, sort, and check dependencies
- Replanning after sub-tasks lets the agent adapt when reality does not match the original plan, at the cost of extra model calls
- Decomposition granularity matters — too coarse and you get no benefit, too fine and you drown in overhead
- Planning pays off for complex, multi-phase tasks; for simple tasks, the overhead is not worth it