Agentic Engineeringexploring2026-05-23

Agentic engineering, and what changes when the diff isn't yours

Working notes on the discipline shift from "AI helps me code" to "I run a system that produces code." What I'm reading, what I'm trying, and the open questions I haven't answered.

I keep landing on the same observation: the interesting part of using AI to build software isn't the autocomplete anymore. It's that the unit of work has changed. A year ago I was writing functions with an assistant in the side panel. Now I'm writing prompts, hook scripts, and /skill definitions, and the artifact that lands in the diff was authored by something else.

That's a different job. I'm calling it agentic engineering because the existing labels don't quite fit. "AI-assisted development" undersells it, "vibe coding" sells past it, and "prompt engineering" describes a slice of the surface rather than the discipline.

What I think the discipline actually is

Three things, roughly:

Building the system that produces the work. Hooks, skills, plan-mode templates, sub-agent definitions, recovery loops. The codebase has a second codebase living next to it that nobody talks about.
Driving the loop. Knowing when to stop the agent, when to let it run, when to start over, when to switch models. Most of this is taste, and the taste is learned by burning context windows.
Owning the diff anyway. The agent wrote the code, but I land it. That means I have to read it like a reviewer who didn't write it (because I didn't), but with the accountability of the author (because the PR has my name on it).

The third one is the part I'm least settled on.

What I'm trying right now

Living in plan mode for anything non-trivial. The plan-then-execute split forces me to think before the agent runs, instead of after. This site's own build ran that way from day one: PRD as executable spec.
Hook-driven feedback loops on every change. Type-check + lint on stop, screenshot diff on every UI commit. The agent can't tell you it broke something, but the hook can.
Sub-agents for the work I'd otherwise context-switch into. Code review as a separate agent with no implementation memory. Exploration as a separate agent with read-only tools. Letting them work in parallel when the questions are independent. The multi-agent version of this thinking has its own entry: agent orchestration.
A growing library of /skills for the workflows I run more than once. The marginal cost of formalizing a workflow once you've done it twice is roughly zero now.

What I've settled on so far

Not everything here is open. A few positions have survived enough sessions that I'd defend them:

Plan mode is non-negotiable past two files. The plan is where review happens cheaply. Skipping it trades five minutes of reading for forty-five minutes of unwinding.
Hooks beat instructions. A type-check-on-stop hook has caught more regressions than any prompt line ever did. Rules you want enforced belong outside the model's decision loop.
Read agent diffs as a reviewer, not an author. If I can't explain a diff back in two sentences, it doesn't land. No exceptions for "but it passed the tests."
Formalize any workflow the second time it runs. Skills are cheaper than memory, and the third run is always sooner than you think.

Open questions

Where does the human stay in the loop? Not "should there be a human" (yes), but where. Plan approval? Diff review? Test results? All three? Different per task type? I keep moving the gate around.
How do you measure agent reliability? I have strong intuitions about which prompts work and which collapse, and almost no quantified data. There's a benchmark suite hiding inside my session history that I haven't built yet.
What does code review look like when most diffs are agent-authored? Reviewing your own code is hard. Reviewing agent code you watched generate is harder. I think the answer involves automated review agents and humans focused on intent rather than implementation, but I haven't shipped that workflow yet.
What's the new senior-engineer skill set? Less "write the code," more "design the system that writes the code, and notice when it's lying." The interview should probably change.

This entry is going to keep moving. The public versions of these loops are in the labs: Feature Delivery for the PRD-first split, and Ask This Codebase for watching an agent loop run with the seams showing. If you have answers to any of the open questions, the contact is on the About page.