Agentic engineering, and what changes when the diff isn't yours
Working notes on the discipline shift from "AI helps me code" to "I run a system that produces code." What I'm reading, what I'm trying, and the open questions I haven't answered.
I keep landing on the same observation: the interesting part of using AI to build software isn't the autocomplete anymore. It's that the unit of work has changed. A year ago I was writing functions with an assistant in the side panel. Now I'm writing prompts, hook scripts, and /skill definitions, and the artifact that lands in the diff was authored by something else.
That's a different job. I'm calling it agentic engineering because the existing labels don't quite fit. "AI-assisted development" undersells it, "vibe coding" sells past it, and "prompt engineering" describes a slice of the surface rather than the discipline.
What I think the discipline actually is
Three things, roughly:
- Building the system that produces the work. Hooks, skills, plan-mode templates, sub-agent definitions, recovery loops. The codebase has a second codebase living next to it that nobody talks about.
- Driving the loop. Knowing when to stop the agent, when to let it run, when to start over, when to switch models. Most of this is taste, and the taste is learned by burning context windows.
- Owning the diff anyway. The agent wrote the code, but I land it. That means I have to read it like a reviewer who didn't write it (because I didn't), but with the accountability of the author (because the PR has my name on it).
The third one is the part I'm least settled on.
What I'm trying right now
- Living in plan mode for anything non-trivial. The plan-then-execute split forces me to think before the agent runs, instead of after.
- Hook-driven feedback loops on every change. Type-check + lint on stop, screenshot diff on every UI commit. The agent can't tell you it broke something, but the hook can.
- Sub-agents for the work I'd otherwise context-switch into. Code review as a separate agent with no implementation memory. Exploration as a separate agent with read-only tools. Letting them work in parallel when the questions are independent.
- A growing library of
/skillsfor the workflows I run more than once. The marginal cost of formalizing a workflow once you've done it twice is roughly zero now.
Open questions
- Where does the human stay in the loop? Not "should there be a human" (yes), but where. Plan approval? Diff review? Test results? All three? Different per task type? I keep moving the gate around.
- How do you measure agent reliability? I have strong intuitions about which prompts work and which collapse, and almost no quantified data. There's a benchmark suite hiding inside my session history that I haven't built yet.
- What does code review look like when most diffs are agent-authored? Reviewing your own code is hard. Reviewing agent code you watched generate is harder. I think the answer involves automated review agents and humans focused on intent rather than implementation, but I haven't shipped that workflow yet.
- What's the new senior-engineer skill set? Less "write the code," more "design the system that writes the code, and notice when it's lying." The interview should probably change.
This entry is going to keep moving. If you have answers to any of the open questions, the contact is on the About page.