4 min read

AI Code Review for OpenCode with K-LLM Orchestration

Last updated on Mar 8, 2026

I’ve been running Claude Code’s code review GitHub Action on my PRs for months now.

It’s fast, catches real issues, and fits into the existing PR workflow.

But it’s one model giving you one opinion.

Good, not great.

So I built k-review, a K-LLM orchestrated code review tool for OpenCode.

Inspiration

Two things pushed me to build this.

First, Cursor’s Bugbot.

Their post on running multiple models against the same diff and synthesizing the results showed how much a single reviewer misses.

Second, this framing from Palantir’s CTO:

Send one prompt to K LLMs. Each returns a full answer. A synthesis step reads all outputs, compares and reconciles them, then produces one best combined response.

  • Palantir CTO @ssankar on LLM orchestration (source)

That’s the core pattern.

k-review applies it to code review with shuffled diff orderings and majority voting on top.

How it works

k-review runs a four-stage pipeline:

  1. Generate 6 shuffled variants of the git diff using deterministic seeds, so each reviewer sees the changes in a different order
  2. Send all 6 reviews concurrently to different LLM subagents (minimum 3 successful passes required)
  3. Cluster findings by file region and root cause, apply agreement thresholds (4+/6 strong, 2-3/6 moderate, 1/6 weak), rank by severity
  4. Run a validation pass that traces execution paths against the canonical diff to filter false positives

The shuffling matters.

If every model reads the same diff top-to-bottom, they tend to fixate on the same obvious issues.

Shuffling means they don’t all latch onto the same things.

The 6 models

#ModelProviderTemp
1Claude Opus 4.6Anthropic0.3
2Claude Sonnet 4.6Anthropic0.4
3GPT 5.2OpenAI0.35
4GPT 5.3 CodexOpenAI0.45
5Gemini 3 ProGoogle0.5
6Gemini 3 FlashGoogle0.55

All reviewers are read-only.

They can only use read, glob, and grep operations.

You can remove models you don’t have API keys for (minimum 3 required for voting) or add your own.

Installation and usage

Clone into your project’s .opencode directory:

git clone https://github.com/josescasanova/k-review.git .opencode

Or install globally to ~/.config/opencode/ so it’s available in all projects:

git clone https://github.com/josescasanova/k-review.git /tmp/k-review
cp -r /tmp/k-review/agents/ ~/.config/opencode/agents/
cp -r /tmp/k-review/commands/ ~/.config/opencode/commands/
cp -r /tmp/k-review/scripts/ ~/.config/opencode/scripts/

Then in OpenCode:

/k-review                  # Reviews current branch against main
/k-review develop          # Reviews against different base
/k-review HEAD~5           # Reviews specific revision range

You’ll get a summary table ranked by severity, descriptions with affected lines and suggested fixes, plus the raw JSON if you want to process it programmatically.

Wrapping up

The repo is at github.com/josescasanova/k-review. It’s MIT licensed and takes a few minutes to set up. If you’re already using OpenCode, give it a try.

Frequently Asked Questions

What is the K-LLM pattern for code review?
The K-LLM pattern sends the same prompt to K different language models, each returning a full independent answer. A synthesis step then reads all outputs, compares and reconciles them, and produces one best combined response. For code review, this means multiple models independently analyze the same diff, reducing the chance that all reviewers fixate on obvious issues while missing subtle bugs.
How does k-review work?
k-review runs a four-stage pipeline: first it generates shuffled diff variants so each model sees changes in a different order, then it runs six parallel review passes across Claude, GPT, and Gemini models, combines findings through majority voting with agreement thresholds, and finally validates results by tracing execution paths against the canonical diff to filter false positives.
How does multi-model code review compare to single-model review?
Single-model review reflects one model's biases and blind spots. Multi-model review catches things any single model might miss because each approaches the code differently. k-review uses majority voting — findings flagged by 4+ of 6 models are marked strong consensus, while findings from only 1 model are marked weak. This prioritizes precision and reduces false positives compared to relying on a single reviewer.

Subscribe to my newsletter

Get the latest posts delivered to your inbox