Building AI Code Review: A Conversation with Richie

Brandon Waselnuk

January 30, 2026

I sat down with Richie, one of our founding engineers, to talk about building Unblocked's code review product. We covered everything from why code review is broken, what happens when an LLM inherits human biases, to the moment he saw his own voice reflected back at him during a review.

Here's an overview of our conversation.

Why code review is such a mess

The volume of open pull requests has increased dramatically as developers leverage code generation and agentic coding tools. This has made code review even more critical and difficult to navigate than ever before. Developers want something to help them fix the problem but are also skeptical of promises that haven’t been kept by other tools in this space. It’s going to be their name in the `git blame` afterall, not claude code or cursor.

Additionally at larger companies, getting a change merged can take up to a week. Someone goes to lunch, that delays your review. They're bottlenecked by other reviews, it takes even longer. Over a weekend or vacation, it spirals.

But latency is only part of the problem. As Richie put it: "I don't think developers like reviewing code. It's good for information sharing, but it's a burden. It's much more fun to build new things than review someone else's changes. And it's even more of a burden nowadays because you know that code hasn't even been written by a human half the time."

Then there are the biases. Authority bias is one of the biggest: an intern's PR gets scrutinized differently than a principal engineer's, even with identical code. There's bikeshedding—you're confronted with a giant React change touching 50 files, but someone changed a colour, so five people weigh in on the colour while the real essence gets overlooked.

And there's satisfaction of search: you find one nasty bug, feel accomplished, and your sensitivity drops for the rest of the review. You've checked three of ten files and you go, "cool, done."

LLMs have the same biases (and some new ones)

Here's where it got interesting. When Richie started building the code review system, he discovered the model exhibited many of the same biases that plague human reviewers.

"Surprisingly, this is something I actually found happened in our LLM-based review. I think it's because the models are trained on human reviews, human text, and so it's subject to some of the same biases."

For example the models showed satisfaction of search—finding one issue and not bothering with the rest. But it also showed something Richie hadn't seen documented anywhere, which he started calling "null finding anxiety."

"If your job is to find a bug in a system, and your success depends on you finding that bug, then you're gonna find something. The first time I saw this, it was a very simple code review—a single line change. Turn after turn, the agent would agonize and be anxious about finding something until it found something pretty obscure that isn't really a bug, and reported that."

The fix came from an unexpected place: reliability engineering. Aviation uses checklists not because pilots are incompetent, but because systematic processes catch what intuition misses. Richie applied the same thinking—structured review passes, adversarial validation, explicit bias awareness and more.

What we learned from deleted comments

Whil building the code review product, we gave an early release to beta customers who had used our PR Failure Agent—a tool that analyzed CI failures and connected them to specific code changes to fix the issue. Users could react to review comments with GitHub emojis, giving us a feedback signal.

Then something strange happened. People started downvoting comments that appeared entirely correct.

When our team investigated, they couldn't find anything wrong with the comments. So they reached out directly. The first piece of feedback: the comment was fine, it was just obvious. Detecting lint failures or pointing out that a test expected 6 but the function now returns 5—technically correct, but no value added for human reviewers.

The second discovery was weirder. People were deleting comments entirely. Our team assumed this meant users hated them.

"We reached out to some of these people and they responded with feedback. The first thing they said was, 'Oh yeah, I love Unblocked. These comments were great.' And I was like, okay, so what's up here? And they said, 'Well, we deleted the comments because I wanted to clean it up before one of the senior engineers had a look at it.'"

This changed how our team thought about code review. Every comment has a social cost. In large companies, PRs already have multiple bots commenting—some teams have five different classes of automation. Adding noise makes things worse, even if the noise is technically accurate.

**"Does it do the right thing" vs. "Does it do the thing right"**

Richie frames good code review as answering two questions. Most tools focus on the second one: given the code, does it look correct? That's table stakes.

The first question is harder: is this change doing the right thing in the first place? That requires knowing the purpose of the change—which often isn't in the PR description or codebase. Maybe there's a Jira or Linear link. Maybe there's a side conversation in Slack where people discussed the approach. Maybe there's documentation from six months ago that established a pattern everyone's supposed to follow.

"People will have side discussions in Slack. You reach out for help and say, 'Can somebody please review this? It's really urgent.' And rather than people actually commenting on the PR, they tend to just comment in the Slack channel on the same thread."

Unblocked can, uniquely, pull all of that in since we have our context engine already in place. The result is a review that can tell you: are you sure you want to go down this path? There's a bunch of documentation and discussions that point in a different direction.

When Richie reviewed his own voice

One of the stranger moments came when Richie was using the product on his own code. Unblocked left a comment that read like something he would say. When he checked the sources, they were his own comments from previous PRs.

"It felt like almost an inception moment."

The feature is called memories. The system records conversations between humans on PRs, clusters similar discussions, and when it encounters a comparable issue in a new review, it surfaces how that discussion was resolved—citing the original sources often as a link to the PR comment directly.

So the 50th time someone needs to hear "stop using useEffect for this pattern," the system can say it for you, grounded in your team's own history.

The problem with benchmarks

There are a lot of AI code review tools now. Some are trying to establish benchmarks to help teams choose between them. Richie sees a problem with how these are being designed.

"If you measure the number of potential bugs that you find on a PR, and that is your measure for success, then you're gonna optimize for that KPI. You're gonna find more and more issues. And for various reasons we've already discussed, this is not what developers want."

The goal isn't to maximize findings. It's to find real bugs, avoid nitpicks, and keep the conversation moving forward. The acceptance rate of comments matters more than the volume.

Listen to the full conversation

This post explores some of the highlights, but the full conversation goes deeper into bias in LLMs, lessons from reliability engineering, and what makes a code review actually helpful.

Give it a listen.

Unblocked Code Review is available now. Read our docs or reach out to see it in action on your codebase.

‍