Error States that Build Trust
Finance & Banking • ~7–8 min read • Updated May 19, 2025 • By OneMind Strata Team
Context
LLM systems fail differently: retrieval gaps, tool timeouts, policy refusals, vendor drift. Users don’t judge you for failing—they judge you for how you fail. Trust grows when errors preserve context, offer a safe next step, and show enough evidence that the system isn’t guessing or hiding.
What Good Looks Like
- Recognizable category: Label the failure (timeout, retrieval miss, policy refusal) so users know what happened.
- Safe fallback: Provide a partial answer, cached result, or human-escalation path—never a dead end.
- Show your work: Cite sources or tests used to make the decision; display last-eval date for the model/prompt pack.
- Preserve input & progress: Keep the user’s text and state so a retry doesn’t mean rework.
- Retry guidance: Offer a specific, low-effort adjustment (e.g., “Narrow by date range”).
- Correlation ID: Render a short error code; copyable for support & incident review.
Core Patterns
- Tiered responses: Soft warning → degraded output → refusal with rationale; escalate by risk, not by guess.
- Partial answers: Return a safe subset (e.g., top 2 docs with confidence) with a note about the missing pieces.
- Fallback routes: Vector miss → keyword fallback; tool timeout → cached plan; vendor outage → small local model.
- Policy-aware refusal UX: Clear “why” + alternatives; link to policy name/owner for credibility.
- Evidence banner: Show retrieved items or tests that failed; avoid opaque “something went wrong.”
Recommended Actions
- Adopt an error taxonomy:
{timeout, retrieval_miss, policy_refusal, tool_error, vendor_outage, quota, unknown}
. - Ship a reusable component: <AiErrorCard/> with slots for category, evidence, correlation_id, next_actions.
- Pre-bake fallbacks: Decide the safe degraded response per use case; test it in CI with golden failures.
- Wire observability: Log
{category, correlation_id, surface, model_version, prompt_pack, guardrail_scores}
. - Copy system: Centralize error copy with tone, brevity, and domain examples; localize titles.
- Run failure drills: Chaos-test tool timeouts, retrieval zero-hits, and vendor 500s monthly.
Common Pitfalls
- Generic messages: “Try again later” teaches nothing and drives abandonment.
- Input loss: Clearing the compose box after an error multiplies user friction.
- Infinite retries: Repeat failure loops without suggesting a different route.
- Opaque policy blocks: Refusing without citing the policy or offering escalation.
- Silent degradation: Returning degraded content without telling the user.
Quick Win Checklist
- Implement <AiErrorCard/> with a correlation ID and a “Copy details” button.
- Add keyword fallback for retrieval zero-hits; cache a last-good answer for timeouts.
- Expose why refused with policy name and contact.
- Keep user input; pre-fill retry with suggested edits.
Closing
Design error states as carefully as success states. With clear categories, evidence, and pre-planned fallbacks, you’ll turn inevitable failures into moments that increase trust—and keep work moving.