January 29, 2026
Multi-Model Ralph + Friends
I've been running the Ralph Wiggum Loop for a couple months now. If you haven't seen it, it's Geoffrey Huntley's technique for autonomous AI coding that has been making the rounds. A bash loop keeps feeding the same prompt to an AI agent until the work is done. Progress lives in git and files, not in one model run's context window. Each iteration gets a fresh context. Huntley calls it "deterministically bad in an undeterministic world."
I should mention, I'm not a developer. The last time I wrote code was CSS and HTML in the mid-2000s building a website for my Spanish teacher who owned a dog breeding business on the side and never paid me the $20 for the site, now that I'm thinking about it. Clients gonna client. Anyway, since then I've designed websites and started businesses. For more technical deep dives, I point you toward Geoffrey and Matt Pocock who have been very helpful for me.
After using Ralph for a while (I have come to fully personify this thing), I kept running into a problem. Codex is great at backend work and heavy tasks but makes shit design decisions. Claude Code (specifically Opus 4.5) is the best model I've found for creative tasks and frontend, but it'll often not think deeply enough on important decisions (experience may vary). I'd set Ralph running overnight and wake up to one of two things: 1) The app worked perfectly but looked like a developer designed it (sorry devs), or 2) it made something beautiful that crashes constantly.
My first solution was to manually switch between the two. I'd set the run increment to the max before Codex would get to a Claude task. That was a pain in the ass and stupid. Babysitting the model is kind of exactly the opposite of why I'm using Ralph. This led to runs stopping in the middle of the night when Codex hit a Claude task at 2am.
Then I tried batching. Group all the Codex tasks together, run them overnight. Group the Claude tasks, run those the next night. But this fell apart fast because I'm a dog using a keyboard and these models are actually better at planning when tasks get done than I am. You can't build all the backend before any frontend exists, Codex complains. Some tasks depend on each other and another benefit of using Ralph is that he makes the decision on what to tackle when.
Finally I realized I could just label the tasks and let Ralph decide which model to use.
I added an assignedModel field to the stories in my prd.json:
{
"id": "UI-001",
"title": "Home screen layout",
"assignedModel": "claude",
"passes": false
}Before each iteration, the script reads prd.json, finds the next story that isn't done, and checks which model it wants. Claude gets frontend, UI, design work. Codex gets backend, infrastructure, APIs. If a story doesn't specify, it defaults to Codex.
Like Ralph itself, it's a pretty simple trick.
Then Ralph needed some friends.
Ralph is the executor. Calm, methodical, does one story per run, proves it works, moves on. With an insane amount of persistence. My prompt tells him to prefer "simple, boring solutions" and to not ask questions unless he literally cannot proceed. He's not here to do the smartest work, he just finishes things.
Marge helps plan projects. She takes a brain dump and turns it into a structured PLAN.md that Ralph can execute. She asks clarifying questions in small batches and makes sure the plan is specific enough that Ralph can run all night without hitting a blocker. I originally built this as a Claude Project but find it simpler to have her in the repo with Ralph and Lisa.
Lisa comes in for QA and polishing. Her prompt tells her "Ralph ships fast. You ship correct." She runs stricter verification, adds tests for risky stuff, and has a quality checklist that includes things like "do not weaken security to make things work" and "remove obvious AI slop you introduced." She catches some of Ralph's dumb mistakes.
I have an idea to use Bart as a creative brainstorming partner who may wreck the entire project, but not quite sure where he fits yet.
Download the prompts for Ralph + Friends
I've built a few things with this approach. It's not been seamless, but things are getting built.
Mira: My therapist insists that you have to truly listen to people to understand them. That sounds like a thing I'd like to get better at. But there's only two ways to get reps at a thing like this, you either hear about people's emotional issues a lot or you build an AI tool that will feed you emotional issues on command. Yes, I know how terrible this all is. Anyway, using OpenAI Whisper + Opus 4.5, I have a daily conversation with Mira with a goal to echo back "What I'm hearing is…" at 95% accuracy to the AI's original stated view. This is the type of thing I never would have built before LLMs. Now it took a couple nights and I have a working prototype.
Pickle Up: As we learn how to use these new approaches at Mostly Serious, we need trial projects. Our Director of Design is obsessed with Pickleball (so is our Controller, and they are unsurprisingly the two oldest people in our office), and his herd of pickleballers are up in arms that their favorite game scheduling app is getting ready to charge a subscription. So, we planned with Marge during our weekly 1:1 and then set Ralph off on the project for the rest of the day. This was actually the project that convinced me to introduce model routing, because Codex produced a functioning nightmare the first go. After implementing this approach, it produced a beautiful, minimal, shadcn/ui app that will get the ballers pickling again.
Proposal automation: This one is a work in progress. I'm very excited by an agent-driven digital + PDF proposal generation tool that is fully integrated into our Ralph-built in-house CRM. There's a great slide deck skill for Claude Code I intend to build off of to allow one-click proposals generated by our CRM's client research (Gemini), meeting notes (Granola), any other documentation added the CRM, and our existing templates.
This weekend, a friend looped me in on Gas Town, which he had been toying with. It's a system built by Steve Yegge for running 20-30 parallel agents with seven distinct roles and complex orchestration. It's super interesting and complicated and reminds me a bit of the Tools for Thought crowd who spend more time tinkering with their tools than using them to get real work done.
I like simplicity. Maybe someday I'll be running a town. For now, I'm happy with Ralph + Friends.


