← All learning notes
Agent Orchestrationdrafting2026-05-23
Agent orchestration, and the patterns that earn their complexity
Working notes on multi-agent systems. Which patterns I keep reaching for, which ones I've burned myself on, and the primitives I think are still missing.
Most of my work right now lives between "single agent in a loop" and "actually orchestrated system." That gap is where the interesting trade-offs are. Adding a second agent is a tax. It costs tokens, latency, complexity, and a new failure surface. So the question I keep asking is: when does the second agent pay rent?
Here's what I think I know so far, plus what's still moving.
Patterns I keep reaching for
- Parallel exploration. Three read-only agents searching different angles of the codebase, returning summaries to a planner agent. Cheap, fast, and the failure mode is bounded (they can't write). I use this for any "where is X defined" or "how is Y used" question that's bigger than a single grep.
- Plan / review / execute splits. One agent plans, a second adversarially reviews the plan, a third executes. The reviewer catches the things the planner skipped because it was already attached to the answer. The cost is one extra turn; the value is roughly catching every "looks reasonable but assumes the wrong constraint" plan I would have shipped.
- Durable orchestrators for anything that crosses a network boundary. If an agent step depends on an LLM call, an API call, or a long-running build, the orchestrator should be durable. Vercel Queues, Temporal, anything that resumes on crash. I've lost too many three-step workflows to a single 502 to keep doing this in-memory.
- Specialist
/skillsinvoked by a generalist. Instead of a giant prompt that knows everything, a generalist that knows when to call/qa,/ship,/investigate,/codex. The routing is the orchestration.
Failure modes I've actually hit
- Context bleed. Sub-agent finishes, returns a 2,000-token summary, parent agent now has both its own context and the summary, and the summary subtly contradicts what the parent was holding. The parent goes off the rails confidently. The fix is treating sub-agent output like input from a stranger: read it, decide what to keep, throw the rest away.
- Agent loops. Two agents handing work back and forth, each thinking the other is in charge. Always solvable with an explicit termination condition, never as obvious in design as it is in retrospect.
- Cost runaway. Parallel exploration with three sub-agents at $X per call, called twenty times in a session, is a real bill. I now budget per workflow and surface the number to myself.
- The orchestrator is wrong. The hardest one. The plan was good, the agents executed well, the result was the wrong thing because the orchestrator framed the problem badly. No amount of agent quality fixes a bad orchestrator.
What I think is still missing
- First-class agent observability. I want a timeline view: which agent ran, what it saw, what it returned, what the parent did with it, where the context delta came from. The tooling exists in pieces (LangSmith, Helicone, the AI SDK telemetry) but not in the shape I want.
- Cheap intermediate evaluation. Right now I run an agent, see what came back, and judge it myself. I want a small, fast eval-as-you-go that catches "this output is plausibly broken" without me reading every line.
- A real story for human takeover mid-workflow. The handoff pattern in the gstack
/browseskill is the closest thing I've seen to "stop, let me drive, resume." Most workflows fall back to "fail, ask the user, start over."
Drafting status, not settled. The patterns above are the ones I'd ship today. The list of patterns I'd ship in three months is probably different.