AI-assisted code review is changing what 'reviewer' means
Three months of running an AI-assisted review workflow on a real team. What's better, what's worse, and what I'm watching.
I’ve been running an AI-assisted code review setup with my team for about a quarter now. Not the marketing version — the real version, with arguments about it in retros and one engineer who initially refused to use it. Here’s where I’ve landed.
The setup
Every PR gets two reviews before merge:
- A pre-review pass by an LLM, configured with the team’s style guide and our list of “things we don’t ever do.” It comments inline and posts a top-level summary. This runs automatically on PR open.
- A human review from a teammate. Same as before, except now the human is reading a PR that’s already had the obvious stuff caught.
We turn off the bot for hotfixes and security-sensitive code paths.
What got better
Time-to-first-review collapsed. Used to be a median of four hours. Now it’s under five minutes for the bot pass and the human review tends to come faster too — probably because reviewers know the trivial stuff is already flagged, so it feels less like a chore.
Style nits are gone. Bracket placement, naming conventions, missing tests for edge cases — all caught by the bot, none of it by humans anymore. Reviewers got their attention back for the things that actually require taste.
Junior engineers ship faster. The bot is patient in a way humans aren’t. It will explain why a pattern is suboptimal at 11pm without sighing. New hires went from two weeks to first-merged-PR to four days.
What got worse, or weirder
Reviewer skill is decaying. I noticed this around month two. Senior engineers were skimming bot summaries and rubber-stamping. Bugs got through that a careful human read would have caught — not because the bot was wrong, but because the bot’s confidence was disarming. I’m now requiring reviewers to write a one-line summary of what the PR does in their own words before approving. That fixed most of it.
The team’s collective code memory is fragmenting. Code review used to be how you learned your teammates’ patterns. When the bot does most of the catching, you stop reading their PRs as carefully. Six months from now I expect to see “I had no idea you were working on that” moments that we wouldn’t have had before. I don’t have a fix yet — I’m experimenting with mandatory weekly “human-only” review days.
Hiring signal is harder. Take-home coding tests are basically dead. Live coding now means evaluating how someone collaborates with an LLM, which is a different skill than the one we used to test. I haven’t fully reworked our loop yet. If you have, tell me how.
What I’m watching
Three things, in order of how much they keep me up:
- Liability for AI-introduced bugs. Whose name is on the PR, the human or the bot? Today: the human. In two years? Genuinely unsure.
- The “obvious” pull-request norm shifting. Today, a 200-line PR is normal. With LLM assistance, 2,000-line PRs are getting normalized, and I think that’s a bad direction.
- Junior career arc. If juniors never have to write the boring code, they may never learn the boring code. I don’t know what to do about this yet beyond being noisy about it on the team.
The TL;DR
It’s net positive. It’s not a free win. And the second-order effects on team culture are bigger than the first-order productivity effects, which means most of the discourse about it is missing the point.
If you’re running something similar and seeing different patterns, I want to hear about it.