I wanted a confidence rating without asking an AI

Leif · June 15, 2026

I ask AI for confidence ratings all the time. How sure are you about this change? Rate your confidence in this file, in this edit, in this paragraph. I find it useful, a quick read on how much to trust what just happened. The problem is it needs a model every time. That costs money, and a small local model is not good at it, so the answer you get back is itself low-confidence. I wanted the rating without the model.

So I built augur. It scores the risk of a diff and never calls an AI. Same diff in, same verdict out, every time. It reads two things you already have, your git history and the files on disk, computes eight signals from them, collapses those to a single risk score from 0 to 100, and returns one of three verdicts: proceed, review, or block.

A baseline, not an oracle

What I care about most is that augur is a floor you get for free. No API key, no token bill, no network call. You can run it on every diff without thinking about the cost, in CI, offline, on a plane. That out-of-the-gate number is the baseline. Then you tune it. An optional .augur.toml sets per-signal weights, the verdict thresholds, and [exclude] globs. A human can adjust it by hand. An AI can adjust it too, because it is just a config file and a documented set of signals. The model, if you bring one at all, sits on top of a number that already means something instead of being the only thing that produces it.

Why no model is the point

In a year where the answer to everything is "call an LLM," augur deliberately does not. A gate has to be reproducible. You cannot have the thing that decides whether code merges hand back a different score every run, or a confident-sounding guess from a model that knows nothing about your repo. augur's signals come from real history: which files churn, which ones have a track record of reverts and hotfixes, how big and how scattered the change is. The longer it watches a repo, the sharper it gets, and it never stops being explainable. When it flags a file, it tells you which signal fired.

The same number for both of us

augur reads the same on a human PR and an agent's diff. A person scans the riskiest-first table and spends attention on the risky tenth of a forty-file change. An agent gates its loop on the verdict and stops for review when the number says review. Same signals, same score, same line in the sand, whoever wrote the code. You do not even need a model to use it.

← All posts