Ryan Lopopolo's post on harness engineering is the clearest articulation yet of what engineering looks like when agents write the code. In short: when Codex writes every line, the human's job shifts from writing to designing the environment in which agents can do reliable work — tools, feedback loops, invariants, legibility. They shipped on the order of a million lines in five months with no manually written code. The harness made that possible.
The important part is the loop. Given a single prompt, Codex can now validate the codebase, reproduce a reported bug, implement a fix, validate it by driving the application, open a pull request, respond to human and agent feedback, detect and remediate build failures, escalate when judgment is needed, and merge the change. End-to-end autonomy inside a repository. That loop works because the environment it runs in is fully under the harness's control: bootable per worktree, observable end-to-end, disposable per task. All local.
We want that same loop for cloud applications. The one where "validate the codebase" means validate against real managed services. Where "reproduce the bug" means reproduce it against the infrastructure where the bug actually lives. Where "merge the change" actually updates running production. That loop is much harder to build than the local one — a cloud environment doesn't torch and rebuild per task unless something underneath knows how to do it safely. That's the other half of the harness. That's what Monk provides.
From source to a capsule per branch
Capsules turn every git branch into a full, production-shaped cloud environment with its own HTTPS preview URL. The flow looks like this:
- Point Monk at your repo. Monk reads your source — framework, services, dependencies, data stores, external integrations — and generates a manifest that you check into git. That manifest, plus a small set of actions, is enough to stand up the whole stack on your cloud, wired end to end. The graph isn't hand-authored. It's inferred from the code.
- Run it against your main branch and attach a domain. That's production, if you want it to be.
- Turn on capsules. From that point on, every branch gets its own fully deployed environment on your cloud with a unique HTTPS preview URL. No per-branch manifests. No cluster to pre-provision.
- The coding agent works against the capsule. The same Chrome DevTools integration the OpenAI team wired into Codex for local work now points at a real app, talking to real managed services, reachable over HTTPS. The browser is only one surface. The agent can also hit API endpoints directly, read logs, and inspect the live state of every service in the graph — the same observability a human operator would have, but in context. It pushes a commit, the capsule redeploys, and it sees exactly what changed.
- Merge when the preview works. The main branch redeploys. Production updates. The PR's proof is a URL the reviewer can click.
You don't have to put your main branch on Monk to get this. Capsules work on their own. Leave existing production untouched and point Monk at feature branches — the harness benefit shows up immediately, and the production migration can wait.
Infrastructure changes become pull requests
The graph Monk inferred from your source isn't read-only. The coding agent can modify it, and Monk owns the lifecycle.
The agent can ask Monk to add a hosted auth provider. Monk provisions it inside the capsule, wires the environment variables, and opens the flow for the agent to test against. The PR now contains both the code change and the infrastructure change. When it merges, the same modification propagates to the main branch, and the new provider appears in production.
The same mechanism works for swapping the cache engine, changing the queue backend, benchmarking three databases in parallel, or trying a smaller VM size. It's the same mechanism for running security scanners against every branch, or catching a bad query plan at PR time instead of at 3 AM.
Each of those used to be a project. Now each is a git push.
It takes a substrate
This isn't a feature you bolt on, and it isn't an MCP you install. It's a property of a system with the right substrate underneath. Three things have to be true at once:
- Typed, lifecycle-aware primitives for every cloud resource. Each one knows how to provision, configure, compose, and tear itself down. The graph builds itself by reading your code against this library — not hand-authored, not templated.
- A persistent orchestrator with state. Capsules need to know what they created so they can tear themselves down cleanly. The orchestrator needs to know what's deployed so it can reason about drift, rotation, and lifecycle over time. An LLM with shell access and a handful of MCPs cannot do this. Conversations end. Infrastructure doesn't.
- Scoped credentials as an architectural primitive. Per-branch environments are worthless if creating one means handing every branch the full production secret bundle. Credentials are scoped per entity, per capsule, and per API surface. No branch ever holds the full bundle.
The closest things on the market are Kubernetes-based preview tools. They're good. They also require a pre-existing cluster, hand-authored manifests, and real per-team blueprinting. That's fine if you have the team for it — though even then, the coding agent probably shouldn't be the one authoring your cluster manifests.
Back to the harness
The loop converges because the environment is legible and cheap to reproduce. OpenAI built that environment for themselves, by hand, for one product. On cloud, it has to be built differently — but once it's built into the substrate, every team that connects a coding agent gets the same loop.
Once per-branch production exists, the harness economics carry over:
- The agent tests against reality, not a Docker Compose approximation. OAuth callbacks, CORS, security groups, cert expiry — the whole class of bugs that only appear when real services talk to each other over real networks becomes visible in the dev loop, not in a postmortem.
- Throughput economics flip for infra the same way they flipped for code. "Corrections are cheap, waiting is expensive" becomes true for infrastructure changes too.
- Parallel autonomy works. Ten agents on ten branches, ten isolated productions, no coordination overhead. The assembly line becomes a factory floor.
- The PR is a working preview URL, not a green checkmark. Reviewers click the link. Proof of work, not trust.
A six-hour unattended Codex run against a local observability stack is a remarkable result on its own. The same six-hour run against a capsule — where the agent is observing real managed services, real networks, real third-party APIs — reaches a different class of problem.
What a capsule isn't
Capsules are production-shaped, not production. No real user traffic. No production data volume. No rate-limit hits against live third-party keys — and you wouldn't want any of those in a preview environment.
The gap between "works in the capsule" and "works in production" is the gap between two Monk deployments using the same entities, the same orchestrator, and the same wiring logic. The agent's fix in the capsule is, almost always, the fix in production. Closing the remaining sliver would mean routing live users through a test environment — at which point it isn't a test environment anymore.
The factory
OpenAI built a harness for one product and was honest about what it cost. That work matters. It's also work that shouldn't have to be redone by every team trying to do the same thing for a cloud application.
We built the cloud half of the harness once, as a substrate. A team that connects Monk to a coding agent gets per-branch production on day one — inferred from source, running on their cloud, torn down when the branch is gone. That's the factory floor. The assembly line runs in parallel. Humans walk it, inspect the output, and approve shipment.
Per-change production used to be a platform team's life work. Now it's a prompt.
Connect a coding agent. Spin up some capsules.


