Agent PR reviewers

Agents let you ship more. Reviews become the bottleneck, and agent reviewers aren't there yet.

(3 days ago)
~6 min read

Agent PR reviewers

When an org actually really starts using agents day‑to‑day, your PR volume grows. Even if the number of PR's doesn't grow - the size of your PR's might. It becomes easy to do bigger sweeping changes/refactors/more complete POC's. Eventually, PR reviews are going to become a bottleneck and a risk; definitely started feeling this at Replicated.

There are more impactful things you can do (like reorienting your integration tests to be more easily run by agents), but pulling in an agent reviewer is a low friction way to take the edge off. Humans are fallible and miss things, especially on larger PRs, and robots aren't good enough (yet) to be the sole gatekeeper. Still, agent reviewers can help. We used them as a self‑review gate: when you open a PR, you clear the agent’s feedback first, then you ask a human. Like unit tests and linting.

There's tons of room for improvement, but here's how we used them, what I liked, what I didn't, and what I really want from an AI PR reviewer.

How we used agent reviewers

  • Open PR, let the agent review.
  • Fix what it flags (nits, easy bugs, migration footguns, whatever).
  • Re‑run the agent if needed.
  • Then request a human review.

That's it. Not much to it.

This shaves off some of the avoidable back‑and‑forth and makes human reviews higher‑signal - especially on larger PR's. We used Bugbot from Cursor for this, it's fine - I guess. It's better than some of the other options I played with, but it still feels half-baked.

Bugbot's "ok"

Bugbot was fine. No fluff, just line‑level stuff. It doesn’t spam your PR with summaries and bullshit diagrams like some of the other agent reviewers. It drops direct, inline comments like a human: “this thing is wrong, because xyz.”

There’s some noise, but it catches enough real issues that the trade‑off is worth it. Like, I wanted to complain, but I've definitely seen it catch bugs that humans often missed.

You can scope guidance by area using a hierarchy of rule files, you can keep high‑level guidance broad at the root, and get very nitty-gritty in high‑risk areas like migrations, neat.

  project/
    .cursor/BUGBOT.md
    frontend/
      .cursor/BUGBOT.md
    backend/
      .cursor/BUGBOT.md
      migrations/
        .cursor/BUGBOT.md

But that's kinda it. Not exactly revolutionary. It still pales in comparison to what I can do by pulling down a PR locally, and running my /review-this-thing command against it.

So Bugbot ain't it...

It also has some quirks:

  • It typically just leaves a handful of comments per pass, so later commits sometimes surface “surprise” feedback. Especially on larger PR's with multiple commits. Like sometimes I'll tweak something, and then the agent will suddenly come back and flag something from an earlier commit that it missed the first time.
  • It doesn’t “learn” from your PRs unless humans encode learnings into the rules.
  • Why are bugbot and cursor rules separate files? Generally, the same shit I want bugbot to check is the same rules I want Cursor to follow when writing code. So now I'm maintaining two sets a rules files?
  • The checks are opaque. You can add a rule like “all API endpoints must validate input,” but you can’t tell if the agent explicitly considered that, or if there were too many other feedback items so this hasn't come up yet, or the model just - didn’t catch it. Or whatever. ¯_(ツ)_/¯
  • There are some rules that I really want you to check every time.

What I want from an AI PR reviewer

Any PR comment → one‑click rule (human or agent authored) creation. I want to be able to easily create a new rule from any PR comment, whether it’s mine or the agent’s. That makes building out the rules and review guidelines way faster because it reduces the friction.

Periodic rule mining from recent PRs. The agent should scan the last X PRs and suggest rule updates. When review/bug/patterns emerge, suggest a rule with examples and a rationale. And please - just open a PR.

Explicit subagent reviews as Github Action checks. For some instructions, I want a focused and explicit review pass. Honestly, these are basically how I use bespoke slash commands locally. These are things that should be invoked independently and reported like statuses - like a vanilla Github Action check:

> “Ensure every SQL/ORM call that reads/writes customer data is scoped to team_id.” 

I don't want to have to hope that this was something the agent considered in its freeform review comments. Let me just give you a list of prompts/sub-agents to run as checks on every PR.

Tools access. If you enable tools access, AND you give me the ability to define review agents. Then I can coerce the review agent to do all sorts of cool shit. Like run a targeted UX probe with Playwright. e.g. “does the flow described in the PR body actually work as spec'd not just as tested?”.

Why even have a PR reviewer?

Why can't we just have an org-level /review-this-thing command - that people must invoke before PR'ing? Maybe it's not needed, but...

  1. See'ing the original bugs and feedback is valuable in itself. It's one of the best signals for improving your rules for agents - which in turn will make you more productive with those agents.
  2. At Replicated - we were free to use whatever agent tooling worked for us or was best in class at the time. Something like bugbot lets us apply relatively consistent rules at the PR level at least. So, even if you were using Conductor/Claude/Codex/Roo etc and didn't pull our cursor rules into your context, your PR is still reviewed with our most well tuned rules.
  3. Why do you have lint rules? Because sometimes humans and robots forget to lint. Same deal - people will forget to run /review-this-thing locally.
  4. My Go code is reviewed by humans, but also goes through static analysis tools, linters, security scanners, etc. An AI PR reviewer is just another layer of defense.

Built with React Router v7, deployed on Cloudflare Pages + D1. Vibe coded with Claude.