Most carriers cannot deploy the AI they want because their data will not support it. Policy administration, claims, and actuarial systems grew up as separate stacks, often on different platforms per line of business, so a single customer or risk is scattered across silos with no common key. Telematics and third-party data arrive in formats nobody governs. The richest signal in a claim, the adjuster notes and photos and documents, sits as unstructured text and images. Data readiness is the real bottleneck, and the carriers that fix lineage and integration first get durable value while the rest keep re-cleaning data for every project.
The data is there, but it is fragmented and mostly unstructured
A typical mid-size US carrier runs multiple policy administration systems, often one per line acquired through the years, plus a separate claims system and an actuarial data warehouse that reconciles to neither in real time. There is frequently no enterprise key that reliably ties a policyholder across auto, home, and umbrella, which means a model that needs a full customer view has to be built on a stitched-together extract that goes stale the moment it is created.
The signal that matters most is often unstructured. Industry estimates put the unstructured share of insurance data, adjuster narratives, medical records, police reports, photos, and PDFs, at roughly 80 percent of the total. Models that only see structured fields ignore the majority of what the business actually knows about a risk or a claim. Getting that content into a usable, governed form is where readiness is won or lost.
Four data domains and their readiness gaps
Assess readiness by domain rather than by system, because the same fix rarely serves all four. Each row names the gap that most often blocks an AI use case in that domain.
| Domain | Primary gap | Readiness priority |
|---|---|---|
| Policy administration | No common customer or risk key across lines and legacy systems | Master data and entity resolution |
| Claims | Rich content locked in narratives, photos, and PDFs | Document and image extraction |
| Actuarial and pricing | Warehouse reconciles late; loss data lags exposure data | Timeliness and lineage |
| External (telematics, third party) | Ungoverned formats, unclear consent and retention | Ingestion standards and governance |
Fix the foundation before the next model
- Establish entity resolution to create a stable customer and risk key across policy systems, so any model can assemble a full view without a bespoke extract each time.
- Stand up a document and image extraction pipeline for claims so adjuster notes, medical records, and photos become structured features under version control.
- Define ingestion standards and a consent and retention policy for telematics and third-party data before you build on it, so a downstream model does not inherit a compliance problem.
- Instrument data lineage from source system to model input, so you can prove to an examiner exactly where a feature came from.
- Prioritize timeliness on loss and exposure data feeding pricing, because a model trained on stale loss data misprices the book.
Where readiness projects go wrong
- Building a data lake and calling it readiness, when the real gap is a missing customer key and no lineage.
- Ignoring unstructured claims content, so the model sees a fraction of what the adjuster knew.
- Ingesting telematics and third-party data without consent, retention, and provenance controls, creating a governance liability.
- Cleaning data per project instead of once at the source, so every new use case pays the same cleanup cost again.
Readiness has leading indicators
- Share of policyholders resolvable to a single customer key across all lines.
- Percentage of claims documents and images machine-extracted into structured features.
- Data lineage coverage from source to model input for production models.
- Freshness lag between loss or exposure events and their availability to actuarial models.
Frequently asked questions
Why do our AI projects keep re-cleaning the same data?
Because cleanup is done inside each project instead of once at the source. Without a governed customer key, entity resolution, and lineage, every new model rebuilds the same extract, so fixing the foundation eliminates repeated cost.
Do we need to handle unstructured claims data to get value?
For most claims and fraud use cases, yes. Around 80 percent of insurance data is unstructured, so a model limited to structured fields misses the adjuster narratives, medical records, and photos that carry the strongest signal.
What makes telematics and third-party data risky to use?
The risk is governance, not the data itself. If consent, retention, and provenance are not documented before ingestion, any model built on it inherits a compliance exposure that surfaces later in an exam or a dispute.
Related reading
Go deeper on this sector and topic.