Blog • Science

Why Traceability Beats Accuracy Alone

A model that spits out a perfect prediction with zero explanation is a liability in a high-stakes claim. Trust requires knowing exactly which medical record or pleading drove the math.

TL;DR — Accuracy metrics hide the reality of claims prediction. If an adjuster cannot trace a model output back to a specific line in a plaintiff demand, they cannot defend their reserve or negotiate effectively. Traceability is what makes a prediction actionable.

A claims adjuster sits looking at a dashboard. The machine learning model outputs a settlement prediction of $850,000 for a bodily injury claim. The adjuster has currently reserved $150,000 based on the initial file. The model is a black box. It offers no explanation for the $700,000 delta. What does the adjuster do with this information? They ignore it. They have to. An adjuster must defend a reserve change to a committee. Defense counsel must explain a settlement strategy to a carrier. A naked number provides no defense. This is the primary failure mode of pure prediction in insurance litigation. We obsess over aggregate accuracy metrics while ignoring the operational reality of how human beings make high stakes financial decisions.

Accuracy is a dangerous metric in isolation. A model that achieves high accuracy by memorizing spurious correlations is a liability. A system that hides its reasoning behind billions of parameters is useless when millions of dollars and a bad faith lawsuit are on the line. The insurance industry is currently flooded with vendors selling accuracy. They feed raw claim files into massive language models and ask for a settlement value. This approach conflates two fundamentally different mathematical tasks. Parsing language is not the same thing as modeling probability. Large language models are text generators. They predict the next token based on the statistical distribution of words in their training data. They do not calculate calibrated forecasts. When a text generator outputs a dollar figure, it is hallucinating a number that looks plausible.

The Anatomy of a Defensible Forecast

At Canotera, we enforce a strict separation of concerns between reading and predicting. Generative AI is assigned exclusively to the reading phase. It digests the unstructured reality of a claim file. These files are often thousands of pages of dense legal pleadings, disorganized medical chronologies, and fragmented correspondence. The reading model extracts the facts. It identifies the specific injuries, the jurisdiction, the plaintiff attorney history, and the treatment timelines. It maps this chaos into a rigid, neural-symbolic representation. Every extracted fact contains a hard pointer back to the exact page and paragraph in the source document. This ensures the foundational data is grounded in verifiable reality. The generative model does no guessing. It only structures what is already there.

Once the claim is structured, a completely different system takes over. Mathematical, geometric machine learning models calculate the actual forecast. We train these models exclusively on massive datasets of resolved cases with known outcomes. Because these predictive models operate on structured variables rather than raw text, their internal logic is mathematically transparent. This architectural choice is fundamental. The reader reads. The calculator calculates. If a single monolithic model attempts to do both simultaneously, the causal chain breaks. You can never be sure if the model arrived at a high settlement value because of a severe injury or because the medical report contained an unusual font.

Traceability is the ability to walk the math backward. When our prediction model outputs a reserve delta or flags a high probability of escalation, we compute exactly which variables forced that shift. We calculate the exact feature importance for every single inference. The output interface shows the claims professional the specific drivers behind the number. Driver one is a venue known for outsized jury awards. Driver two is a specific surgery code. The adjuster clicks the surgery code and the original medical PDF opens to the exact highlighted paragraph. The math is tied directly to the text. The text is tied directly to the evidence. This turns a prediction from a dictate into a navigable map of the case.

Calibrated Ranges Over Point Estimates

Honest error reporting is the necessary companion to traceability. A single point guess of a settlement value is mathematically arrogant. The litigation environment is defined by irreducible uncertainty. Social inflation, third party litigation funding, and the rising threat of nuclear verdicts constantly warp the baseline of what a claim is worth. These forces introduce volatility that no algorithm can perfectly erase. A forecasting platform must quantify this uncertainty rather than hide it. If a model projects absolute certainty in an uncertain environment, it is poorly calibrated. Calibration means the model knows what it does not know.

We output calibrated settlement ranges using conformal prediction techniques. A conformal range is a statement of probability grounded in empirical data. It tells the claims professional that based on the geometry of similar resolved cases, a specific percentage of claims with these exact extracted features settled within a defined bracket. This is not a confidence interval based on a normal distribution assumption. It is a rigorous boundary built on the actual outcomes of comparable resolved cases. The system surfaces these comparable cases alongside the range. The claims executive sees the historical anchors that define the upper and lower bounds. They understand exactly why the worst case scenario looks the way it does.

Negotiating From Evidence

This level of traceability changes the fundamental dynamic of a negotiation. If you know exactly why a claim is valued the way it is, you can attack the plaintiff arguments at their weakest points. You allocate defense spend where it actually matters rather than blanketing the case with unnecessary billable hours. You spot the indicators of a runaway claim on day one. You detect the specific phrasing in a demand letter that correlates with third party litigation funding before the plaintiff attorney solidifies their narrative. Setting realistic reserves early prevents the capital bleed associated with late term reserve step ups. The defense negotiates from a position of structured data rather than gut instinct.

The value of artificial intelligence in claims forecasting is not its ability to guess the right answer. The value is its ability to surface the hidden structure of a claim and quantify the probability of different outcomes based on historical reality. Accuracy without explainability is a parlor trick. It works until the moment it fails, and when it fails, it leaves the user with no recourse and no defense. A forecasting platform must earn the trust of the professionals who rely on it by exposing its reasoning at every step of the workflow. A prediction you cannot prove is just a guess with a decimal point.

Want to talk to an executive?

Press, partners, investors, candidates — the inbox is monitored. Tell us who you are and we'll route it to the right person within two business days.