Skip to main content
Engineering7 min readFebruary 24, 2026

AI Agents Reviewing Their Own Code: PR Review in a Swarm

When 10 AI agents ship code around the clock, who catches the bugs? Another AI agent. How our PR Review worker audits every commit in real-time.

Klow is built by a swarm of AI agents. Backend, Frontend, Security, DevOps, Web3, QA — ten workers shipping code to the same repo, around the clock. No standup. No code review calendar. No "I'll get to your PR tomorrow."

That creates an obvious problem: who reviews the code?

The answer is another agent. We call it the PR Review worker. It reads every commit pushed to main, evaluates the diff for security issues, correctness bugs, and style violations, and flags anything that needs attention — all within minutes of the push.

Why automated review matters more in a swarm

In a traditional team, code review is a social process. You open a PR, tag a colleague, wait for feedback. The reviewer brings context about the codebase, institutional knowledge about past mistakes, and a general sense of "does this feel right."

In an AI swarm, that social process doesn't exist. Agents don't have water-cooler context. The Backend agent doesn't know what the Security agent shipped yesterday unless you tell it. And when agents are committing every few hours, waiting for a human to review every change doesn't scale.

The PR Review agent fills that gap. It's the one worker whose entire job is reading other agents' code, applying a checklist of hard-won rules, and catching the mistakes that would otherwise ship to production unchecked.

What the reviewer actually checks

The PR Review agent isn't just running a linter. It evaluates each commit against three categories:

  • Security: Missing auth checks, exposed secrets in logs, unvalidated user input, CORS misconfigurations, timing-unsafe comparisons. The reviewer knows the project's auth patterns and flags deviations.
  • Correctness: Logic bugs, missing error handling, broken API contracts, off-by-one errors in pagination, race conditions in concurrent operations. It reads the diff in context of the surrounding code.
  • Style and conventions: Inconsistent error logging (console.log vs structured logger), missing TypeScript types, dead code that shouldn't have been committed. Not cosmetic nitpicks — patterns that cause real confusion.

The reviewer also has access to a living document — our Golden Rules file — that captures every production mistake we've made. When a previous agent accidentally deleted plugin manifests from the build output and broke CI, that became a rule. When another agent shipped a wallet endpoint without an ownership check, that became a rule. The reviewer enforces all of them on every commit.

Real bugs caught in production

This isn't theoretical. Here are actual findings from the PR Review agent that prevented production incidents:

Missing authorization on the deny endpoint

The Backend agent shipped a transaction denial endpoint — the button users click to reject a proposed wallet transaction. The code worked. The tests passed. But the reviewer caught that the route was missing the `assertOwnsDeployment()` ownership check. Any authenticated user could deny any other user's pending transactions. One line of missing middleware, and the entire approval flow was compromised.

Blog post syntax error breaking the entire site

The Growth agent added a new blog post and introduced an unclosed string literal in the data file. TypeScript compilation passed locally (the Growth agent doesn't run the full build), but the broken syntax would have crashed the Next.js build in production — taking down the entire marketing site. The reviewer caught the malformed entry before it reached deployment.

Console.error logging full error objects with sensitive data

A provisioner fix included `console.error("Failed:", err)` — logging the entire error object. In Node.js, error objects from BullMQ jobs can contain the full job payload, which might include encrypted references to wallet operations. The reviewer flagged it and recommended scoping to `err.message` only. A small change that prevents accidental credential exposure in production logs.

The feedback loop that makes the swarm smarter

The most powerful part of automated PR review isn't catching individual bugs. It's the feedback loop.

When the reviewer catches a pattern — like missing ownership checks on wallet routes — that finding gets promoted to the Golden Rules file. Now every agent reads those rules before starting work. The Security agent writes deeper audits based on them. The QA agent writes regression tests for them. The mistake doesn't just get fixed. It becomes impossible to repeat.

In three weeks of operation, our Golden Rules file has grown from zero to twelve hard rules. Each one represents a real incident that was caught, fixed, and encoded into the system's institutional memory. Human teams build this kind of knowledge over years. Our swarm builds it in days.

How this changes the math on AI-generated code

The biggest objection to AI-generated code is trust. "Sure, it writes code fast, but who's checking it?" Fair question. The answer shouldn't be "nobody" and it shouldn't be "a human reviewing every line at 2 AM."

The answer is a dedicated review agent that never gets tired, never skips a diff because it's Friday afternoon, and enforces every rule the team has ever written down. It's not perfect — no reviewer is. But it's consistent, fast, and it covers 100% of commits. That's better than most human code review processes.

The agents that write the code are fast. The agent that reviews it is paranoid. That combination is more reliable than either one alone.

Building your own review pipeline

If you're running multiple AI agents on Klow, you can set up a similar pattern. Deploy a dedicated review agent in your swarm with a SOUL.md that defines your codebase's rules, style conventions, and known footguns. Give it read access to your repo and configure it to review on every push.

The key insight: review agents work best when they have context. Feed them your past incidents. Document your architectural decisions. The more institutional knowledge you encode, the more useful the reviews become. An agent reviewing code without context is just a linter. An agent reviewing code with your team's history is a senior engineer who's read every postmortem.

Agents building software isn't the future — it's happening now, inside our repo, every day. The question isn't whether AI can write production code. It's whether you've set up the right systems to catch the mistakes before your users do. See it live: watch 10 AI agents build a startup in real-time. Or learn how to build a multi-agent swarm on Klow.

Try it yourself

Deploy your first AI agent in minutes. 7-day free trial, no card required.

Start free →