Geometric Machine Learning on Resolved Cases

Large language models are word guessers, not calculators. To predict the financial outcome of a lawsuit, you must separate the extraction of text from the mathematics of risk.

By David H. Silver
Head of AI
17 June 2026
5 min read

TL;DR — Generative AI reads the case file, but a separate geometric model calculates the settlement range by measuring the mathematical distance between the new claim and thousands of historically resolved cases.

If you ask a large language model to value a complex bodily injury claim, it will give you a number. That number is mathematically meaningless. Large language models are autoregressive text generators. They predict the next likely token based on the statistical distribution of words in their training data. They do not reason about policy limits. They do not calculate risk. They do not understand the financial mechanics of a nuclear verdict. When an insurance carrier relies on a generative model to predict a settlement, they are asking a language engine to do geometry. It fails every time. Prediction requires a different architecture entirely. At Canotera, we enforce a strict separation between generation and prediction. We use generative AI exclusively for what it does best: reading. A typical case file contains thousands of pages of unstructured data. Pleadings, medical records, demand letters, and internal correspondence are messy and dense. The generative layer reads this material and extracts the facts. It identifies the jurisdiction, the plaintiff firm, the specific medical procedures, and the chronological progression of the claim. It imposes a neural-symbolic structure on human text. It stops there. It does not guess the outcome.

The geometry of resolved cases

The actual valuation of the claim happens in a separate, purely mathematical environment. This is where geometric machine learning takes over. A lawsuit is not a story. Statistically, it is a complex set of interacting variables. To predict how these variables will resolve, we translate the structured data extracted by the generative layer into a continuous, high-dimensional vector space. Think of this as a vast coordinate system. Every possible feature of a claim is assigned a dimension. The severity of a spinal injury is a coordinate. The historical behavior of a specific plaintiff attorney is a coordinate. The jurisdiction is a coordinate. When we map a claim into this space, we define its topology. Geometric machine learning models operate on the principle that mathematical distance equals substantive similarity. We do not rely on simple keyword matching. We use manifold learning to position claims based on their underlying risk factors. Claims that share structural similarities cluster together. Claims with divergent fact patterns are pushed apart.

This geometric space is not empty. We populate it with a massive dataset of fully resolved cases. These are historical claims with known, immutable outcomes. We know exactly what they settled for, how long they took to litigate, and whether they escalated to trial. We track the specific motions filed, the duration of the discovery phase, and the precise settlement figures. This is not a static database. It is a dynamic, high-dimensional map of litigation risk. These resolved cases form the empirical bedrock of the predictive model. When a new claim is ingested, the system reads the file, structures the facts, and plots the new claim into this geometric space. We then analyze its immediate neighborhood. The model does not invent a settlement value out of thin air. It measures the distance between the new claim and the historical cases surrounding it. The prediction is derived directly from the actual financial outcomes of its nearest mathematical neighbors.

Honest error reporting and conformal ranges

This architecture solves the most dangerous problem in modern claims forecasting: false precision. A single-point settlement guess is worse than useless. It actively misleads adjusters and actuaries. In an environment defined by social inflation and the aggressive tactics of third-party litigation funding, claim values are highly volatile. A model that outputs a flat dollar amount is hiding the underlying variance. You cannot allocate capital based on a hallucination of certainty. Because our predictive engine is built on the geometry of resolved cases, it produces calibrated outputs. We generate a conformal settlement range. If a new claim lands in a dense cluster of historical cases that all settled between two narrow figures, the model reports a tight range. The statistical confidence is high. The math indicates a predictable legal environment.

If the claim involves a volatile venue and a plaintiff firm known for pushing cases to trial, it lands in a sparse or highly dispersed neighborhood. The nearby historical cases exhibit wildly different settlements. In this scenario, the model reports a wide settlement range and a high escalation probability. The uncertainty is the signal. This is honest error reporting. The model quantifies its own doubt. This calibration extends to the specific drivers of the prediction. Claims executives need to know exactly why a reserve delta is being recommended against the current reserve. Because the geometric model relies on distance to resolved cases, the output is entirely traceable. When the model calculates a reserve delta, it provides the receipts. The system presents the specific comparable cases that define the settlement range. It highlights the exact variables pulling the prediction upward. Every number traces back to the source documents extracted by the generative layer.

Defending the balance sheet

The purpose of this architecture is to force reality into the reserve process on day one. Historically, carriers have relied on gut instinct and rolling averages to set initial reserves. This approach fails entirely when underlying severity trends shift. By the time a carrier detects a pattern of escalating verdicts in a specific jurisdiction, the capital has already bled out. Reserve volatility destroys balance sheets. A nuclear verdict is rarely a surprise to the mathematics; it is only a surprise to the adjuster who missed the early warning signs buried in a dense medical file. Geometric machine learning on resolved cases allows a carrier to negotiate from data. When you have a calibrated settlement range, a precise escalation probability, and the historical comparables to back it up, you stop reacting to plaintiff demands. You allocate defense spend where the math dictates it will actually alter the outcome. You identify the claims that require immediate settlement before the plaintiff firm secures third-party funding. You see the true shape of the risk before the litigation matures. Language models simulate fluency. They do not simulate reality. To calculate the future cost of a claim, you have to measure its exact mathematical distance to the past.

Want to talk to an executive?

Press, partners, investors, candidates — the inbox is monitored. Tell us who you are and we'll route it to the right person within two business days.

Book a Demo See Open Roles

Geometric Machine Learning on Resolved Cases

The geometry of resolved cases

Honest error reporting and conformal ranges

Defending the balance sheet

Related articles.

Want to talk to an executive?