Blog • Science

Generation Is Not Prediction

Large language models are built to produce plausible text, not accurate forecasts. Confusing a statistical parrot for a mathematical pricing engine is a fast way to misprice your entire claims portfolio.

  • By David H. Silver
  • Head of AI
  • 9 April 2026
  • 5 min read

TL;DR — Use LLMs to read and structure your unstructured claim files, but rely on specialized geometric machine learning models to forecast settlement ranges and escalation risk. Combining the two gives you traceable facts and calibrated math without the hallucinations.

A claims adjuster sits down with a 4,000-page file of medical records, deposition transcripts, and demand letters. They feed the PDFs into a popular large language model and ask what the claim is worth. The model spits out $450,000. It sounds authoritative. It references a specific back injury from page 1,204. It notes the correct legal venue. It is also entirely fabricated. This is the fundamental error happening across the insurance industry right now. Claims executives are trying to force text generators to act as actuarial engines. A large language model is an autoregressive engine. It predicts the next token in a sequence based on the statistical distribution of its training data. It is engineered for fluency, not for calibration. When you ask it for a settlement value, it searches its weights for text patterns that look like settlements. It does not calculate risk. It mimics certainty. Generation is not prediction. These are distinct mathematical tasks requiring completely different architectures.

The insurance industry is currently facing unprecedented pressure from social inflation and the aggressive deployment of third-party litigation funding. Surviving this environment requires precision. You have to separate the reading mechanism from the forecasting mechanism. Blurring the line between the two is a fast way to misprice your entire claims portfolio and leave your balance sheet exposed to massive reserve volatility. Setting realistic reserves on day one requires an architecture that respects the boundary between language and math.

The boundary between reading and reasoning

Unstructured text is the primary barrier in modern claims management. A complex bodily injury claim arrives as a chaotic stack of unstructured data. Pleading documents, medical histories, and opposing counsel correspondence are buried in raw text. Here, generative artificial intelligence is exactly the right tool for the job. Language models excel at extraction, summarization, and normalization. They can ingest thousands of pages, locate the specific medical billing codes, identify the treating physicians, and structure the plaintiff's demands into a standard format. At Canotera, we use generative models to do exactly this. They read the case file. They map the messy reality of litigation into a clean, structured schema. They do the heavy lifting of parsing the documents so the claims professional does not have to spend three days hunting for a single independent medical examination report. The model turns unstructured chaos into a standardized factual matrix.

That is where the generative model's job ends. It stops at the boundary of language. It does not output a reserve recommendation. Asking a language model to price a claim is like asking a dictionary to calculate a bridge's load capacity. The structured data is instead handed off to a separate system built entirely for prediction. Real forecasting requires geometric machine learning trained on large numbers of resolved cases with known outcomes. These predictive models map the extracted claim features into a high-dimensional space. They measure the distances between the current open claim and thousands of historical cases. This is not about generating plausible sentences. It is about calculating mathematical similarity, variance, and expected value. The predictive model looks at the geometry of the claim, compares it to the geometry of resolved claims, and computes the likely financial outcome based on actual historical settlements.

Conformal ranges and honest error reporting

Claims executives do not need a single-point guess. A model that says a claim will settle for exactly $125,000 is practically useless because litigation is inherently probabilistic. You need to know the shape of the risk. You need calibration. A calibrated model knows when it is uncertain and reports that uncertainty honestly to the user. This is why we rely on conformal ranges. Instead of a fragile point estimate, a calibrated predictive model outputs a settlement range backed by a specific statistical guarantee. If the claim involves a common rear-end collision with standard soft-tissue injuries, the historical data is dense, and the resulting range is tight. If the claim involves a traumatic brain injury in a jurisdiction known for volatile juries, the historical data is sparse and highly variable. The model expands the range. It quantifies the variance. It tells the adjuster exactly how much risk sits on the tail.

You cannot get a mathematically rigorous conformal range from a prompt. Language models are famously uncalibrated. They will deliver a wild guess with the exact same authoritative tone they use to state a known fact. In a world of rising nuclear verdicts and aggressive plaintiff tactics, deploying uncalibrated models to set early reserves is a dangerous game. You end up under-reserving the dangerous files and over-allocating defense spend to the routine ones.

Neural-symbolic structure and traceability

A forecast only has value if the claims professional can defend it. When our predictive models output a reserve delta compared to the current reserve, or an escalation probability, they also surface the specific drivers behind those numbers. Because we maintain a strict neural-symbolic boundary between the generative reading step and the predictive math step, every output is traceable directly to the source documents. If the model flags a high probability of litigation escalation, it points back to the structured facts. Perhaps the plaintiff attorney has a history of taking similar premises liability cases to trial. Perhaps a specific medical code correlates highly with late-stage surgical interventions. The mathematical model surfaces the comparable resolved cases that drove its prediction. The adjuster can click through and read the actual source documents from the historical files to verify the context. The data drives the negotiation. The gut feeling is replaced by a verifiable baseline.

This two-system approach is the only way to build enterprise trust. The language model handles the unstructured text. The geometric model handles the pricing math. Neither system is asked to perform a task outside its mathematical design. When you force a text generator to predict financial outcomes, you get hallucinated numbers wrapped in confident prose. When you separate the tasks, you get a calibrated engine that allows your team to allocate resources efficiently and spot the dangerous claims before they explode. Fluency is not accuracy, and a plausible sentence will never price a claim.

Want to talk to an executive?

Press, partners, investors, candidates — the inbox is monitored. Tell us who you are and we'll route it to the right person within two business days.