← Back to Blog

Best-in-Class Signal-to-Noise

Tanagram Canon meaningfully outperforms other code review bots in signal-to-noise, with some users seeing a greater than 3× improvement.

A chart showing "Comment addressed rate" for various code review bots: Qodo at 5.9%, Greptile at 12.0%, Coderabbit at 13.1%, Cubic at 15.3%, Bugbot at 15.4%, and Tanagram ranging from 16.5–48%, with an outlier at 64% for one customer who mandated that all Tanagram comments be addressed before people reviewed a PR.

Methodology

We consider a comment on file ff and line nn on commit CiC_i" addressed" if there exists a commit Cj,j>iC_{j, j > i} that also modifies line nn in file ff. This is admittedly a loose definition of "addressed", but we found that it correlated well with bots' self-reporting of whether a given comment was resolved.

For non-Tanagram data, we started by searching open-source projects on Github to identify a set of repos with between 100–10,000 stars that contained comments from any of the identified bots. We ended up with 65 arbitrary repos, most of which had only one of the bots installed.

For each such repo, we selected up to 50 comments from each bot, only selecting merged PRs, with no more than one comment from any given PR to avoid per-PR bias (e.g. a hotfix PR that attracted many comments but was merged in haste).

Given rate-limit constraints, we ended up with 2052 comments:

  • 502 from Greptile
  • 947 from Coderabbit
  • 17 from Qodo
  • 262 from Cubic
  • 324 from Bugbot

In Github's API, each comment is anchored to a specific commit CiC_i in a PR. For each comment, we listed the commits in its corresponding PR, generated the git diff for all subsequent commits Cj,j>iC_{j, j>i}, and checked to see if the comment's file and line were contained in that diff.

Tanagram data started with our database records of the comments we generated, instead of searching through Github, but was otherwise derived the same way. Our data is segmented by user, hence the range in output.

What Makes Tanagram Different

With so many players in the market, code review is commoditized1. Everyone has access to the same market beta in model intelligence and harnesses.

Our alpha comes from what we choose to evaluate.

Most other bots are designed to find a bug, any bug. That's certainly useful — we use Bugbot and it's caught major issues — but it leads to noisy, inconsistent results:

  • Low-priority nits
  • Overlooked problems
  • Different problems coming up every time you push a change

In contrast, Tanagram focuses on the exact rules that your team cares about. This means everything Tanagram looks for is something that your team explicitly cares about, and we can give precise instructions to our agent, improving both precision and recall.

Why It Matters

Writing software is easier than ever; but knowing what to write — architecture, design, evolving patterns — becomes a bottleneck. In an org of 20 engineers, 1 or 2 of them are the subject-matter experts that get pulled onto every project to offer their expertise and judgment.

This doesn't scale. Although, as an industry, we've sped up other aspects of our software factory, the limiting step remains expert review.

Tanagram solves this bottleneck by indexing every team's history, insights, and expertise, and automatically uses that context to guide engineering output across the development lifecycle. It's a copy of your best principal engineer, available to every engineer.

Try Tanagram

Tanagram Canon is a repository of team-specific rules that powers code reviews on Github.

It works alongside our CLI, which uses those same rules to steer agent output while they're generating code, before PR time.

The CLI also powers Lore, which enables teams to archive and collaborate on coding agent threads.

We encourage you to explore the documentation or try it out.

Footnotes

  1. That's why we consider code review to be a feature within our broader product offering