AI in Insurance: Data Readiness

Summary

Most carriers cannot deploy the AI they want because their data will not support it. Policy administration, claims, and actuarial systems grew up as separate stacks, often on different platforms per line of business, so a single customer or risk is scattered across silos with no common key. Telematics and third-party data arrive in formats nobody governs. The richest signal in a claim, the adjuster notes and photos and documents, sits as unstructured text and images. Data readiness is the real bottleneck, and the carriers that fix lineage and integration first get durable value while the rest keep re-cleaning data for every project.

Context

The data is there, but it is fragmented and mostly unstructured

A typical mid-size US carrier runs multiple policy administration systems, often one per line acquired through the years, plus a separate claims system and an actuarial data warehouse that reconciles to neither in real time. There is frequently no enterprise key that reliably ties a policyholder across auto, home, and umbrella, which means a model that needs a full customer view has to be built on a stitched-together extract that goes stale the moment it is created.

The signal that matters most is often unstructured. Industry estimates put the unstructured share of insurance data, adjuster narratives, medical records, police reports, photos, and PDFs, at roughly 80 percent of the total. Models that only see structured fields ignore the majority of what the business actually knows about a risk or a claim. Getting that content into a usable, governed form is where readiness is won or lost.

The framework

Four data domains and their readiness gaps

Assess readiness by domain rather than by system, because the same fix rarely serves all four. Each row names the gap that most often blocks an AI use case in that domain.

Domain	Primary gap	Readiness priority
Policy administration	No common customer or risk key across lines and legacy systems	Master data and entity resolution
Claims	Rich content locked in narratives, photos, and PDFs	Document and image extraction
Actuarial and pricing	Warehouse reconciles late; loss data lags exposure data	Timeliness and lineage
External (telematics, third party)	Ungoverned formats, unclear consent and retention	Ingestion standards and governance

Recommended actions

Fix the foundation before the next model

Establish entity resolution to create a stable customer and risk key across policy systems, so any model can assemble a full view without a bespoke extract each time.
Stand up a document and image extraction pipeline for claims so adjuster notes, medical records, and photos become structured features under version control.
Define ingestion standards and a consent and retention policy for telematics and third-party data before you build on it, so a downstream model does not inherit a compliance problem.
Instrument data lineage from source system to model input, so you can prove to an examiner exactly where a feature came from.
Prioritize timeliness on loss and exposure data feeding pricing, because a model trained on stale loss data misprices the book.

Common pitfalls

Where readiness projects go wrong

Building a data lake and calling it readiness, when the real gap is a missing customer key and no lineage.
Ignoring unstructured claims content, so the model sees a fraction of what the adjuster knew.
Ingesting telematics and third-party data without consent, retention, and provenance controls, creating a governance liability.
Cleaning data per project instead of once at the source, so every new use case pays the same cleanup cost again.

Metrics that matter

Readiness has leading indicators

Share of policyholders resolvable to a single customer key across all lines.
Percentage of claims documents and images machine-extracted into structured features.
Data lineage coverage from source to model input for production models.
Freshness lag between loss or exposure events and their availability to actuarial models.

FAQ

Frequently asked questions

Why do our AI projects keep re-cleaning the same data?

Because cleanup is done inside each project instead of once at the source. Without a governed customer key, entity resolution, and lineage, every new model rebuilds the same extract, so fixing the foundation eliminates repeated cost.

Do we need to handle unstructured claims data to get value?

For most claims and fraud use cases, yes. Around 80 percent of insurance data is unstructured, so a model limited to structured fields misses the adjuster narratives, medical records, and photos that carry the strongest signal.

What makes telematics and third-party data risky to use?

The risk is governance, not the data itself. If consent, retention, and provenance are not documented before ingestion, any model built on it inherits a compliance exposure that surfaces later in an exam or a dispute.

AI in Insurance: Data Readiness

The data is there, but it is fragmented and mostly unstructured

Four data domains and their readiness gaps

Fix the foundation before the next model

Where readiness projects go wrong

Readiness has leading indicators

Frequently asked questions

Why do our AI projects keep re-cleaning the same data?

Do we need to handle unstructured claims data to get value?

What makes telematics and third-party data risky to use?

Related reading

This is a taste. The full library goes deeper.

Stratenity is the AI Operating System for Strategic Execution.

AI in Insurance: Data Readiness

The data is there, but it is fragmented and mostly unstructured

Four data domains and their readiness gaps

Fix the foundation before the next model

Where readiness projects go wrong

Readiness has leading indicators

Frequently asked questions

Why do our AI projects keep re-cleaning the same data?

Do we need to handle unstructured claims data to get value?

What makes telematics and third-party data risky to use?

Related reading

Found this useful? Pass it on.

This is a taste. The full library goes deeper.

Stratenity is the AI Operating System for Strategic Execution.