How I Use AI to Write Code

Ralph Orchestrator has grown in popularity far more than I expected. The question I get most often is some variation of “how do you actually use this thing?” In a previous post I covered my code-assist workflow, and this is the deeper dive into my full AI-native development process. Fair warning: this will be outdated by the time you read it. 😉

This isn’t a step-by-step tutorial. It’s a set of principles I’ve landed on after months of iterating on how I work with coding agents.

The ground shifted

If you’ve been paying attention since the Opus 4.6 and gpt-5.3-codex releases you already know: software engineering changed. A year ago, getting a coding agent to reliably edit code required an absurd amount of scaffolding. AGENTS.md files, elaborate prompt engineering, custom workflows. Now a simple plan-and-execute loop gets you a long way. That said, I still think there are techniques that improve outcomes. Here’s what’s worked for me.

Always start with a plan

For anything beyond a trivial change, I work with an agent to build a plan first. This sounds obvious, but the nuance is in how you ask.

Words matter to these models. I’ve found that using the verb “study” drives the agent to seek deeper understanding of a problem compared to “how does X work” or “go look at Y.” It reads the code more carefully, traces more call paths, asks better clarifying questions. “Research” is another keyword I rely on, but it appears to push the model toward breadth over depth. This distinction sounds minor. In practice, it’s the difference between an agent that understands the system it’s about to modify and one that confidently hallucinates its way through an implementation.

I don’t have hard metrics on this, but this simple preflight step has reduced hallucinations and wasted time more than any other technique I’ve adopted. The models are good enough now that the bottleneck isn’t capability. It’s context. Give the agent a reason to build the right context and it will.

Three buckets

Once a plan exists, the next question is: how big and complex is this task? If you don’t know, ask the agent. The answer drives everything about how I orchestrate execution. Tasks fall into three buckets.

The first is ad-hoc. Rename something, refactor a single function, update a config. These changes touch one or two files and the goal is obvious. No ceremony needed. Just do it.

The second is what I call a code-task. Bug fixes, improvements, refactors, polish. For these I use ralph code-task to generate a structured markdown file that acts as a contract between me and the agent. It defines the goal, the technical requirements, the implementation approach, and most importantly, the acceptance criteria. Acceptance criteria is where I spend most of my time and where I catch the most incorrect assumptions before they become incorrect code. Once I’m satisfied with the task, I feed it into the code-assist SOP or more recently ralph run -H code-assist.

The third bucket is full feature implementation. This is where I use ralph plan to generate a complete design and implementation plan. From there I hand it off to code-assist to break down the feature into small chunks, implemented in its own fresh context window. After each task completes, the work is checked against an adversarial agent. Inevitably, the loop will complete with gaps and bugs. This is where I refine via ad-hoc and code tasks. Rinse and repeat until done.

Ralph loops are useful even for simple code-tasks. The value is the adversarial nature of the loop: multiple personas validate the completeness of an implementation autonomously. One hat builds, another tears it apart, a third decides if it’s actually done. This works well for delegation and when functionality matters more than code quality. It takes longer to complete, a bad prompt can result in wasted time, but if the spec is clearly defined, the quality of outputs are quite good and require less hands-on steering and guiding.

Where this is heading

The real engineering work is shifting to specs and agent harnesses. How well can you structure and define your codebase, with proper backpressure mechanics, so that agents do the work while you focus on what’s next.