Versioning Prompts, Policies, and Models Together
Cross-Industry • ~7–8 min read • Updated Sep 27, 2024
Context
Most AI regressions aren’t single-file mistakes—they’re set mismatches: a prompt tweak that clashes with a new policy filter, or a routing change that invalidates cached outputs. Treating each artifact separately makes rollbacks messy and audits painful. Release sets, not parts.
Core Framework: Release Sets
- Define the set:
{model_ref, prompt_pack, policy_pack, routing_table, tool_specs, retrieval_cfg, eval_suite}
. - Single tag: Use one immutable tag (e.g.,
ai-suite@2024.09.27
) applied to every artifact and runtime switch. - SemVer the set:
MAJOR
(policy or model change),MINOR
(prompt/routing),PATCH
(typo, threshold nudge). - Environment gates: dev → staging → canary → prod with the same set tag; no “rebuilds” between envs.
- Diffs & notes: Store human-readable release notes, and machine diffs for prompts/policies & routing deltas.
Recommended Actions
- Create a Set Manifest: Versioned JSON (or lockfile) listing all artifact URIs and checksums.
- Tie CI to Evals: Staging promotion requires passing the tagged eval_suite (answerability, safety, latency, cost).
- Wire Feature Flags: Roll out by segment or tenant; keep previous set hot for instant rollback.
- Cache Discipline: Namespaced caches by set tag; invalidate on promotion to avoid mixed outputs.
- Policy Precedence: Registry that enforces system > policy > tool > app prompts per set.
Common Pitfalls
- Artifact drift: Prompt edited in-prod without bumping the set tag → unreproducible incidents.
- Env snowflakes: Rebuilding prompts or dependencies per env creates “works in staging” bugs.
- Unscoped caches: Old answers leaking into new releases; no TTL by risk tier.
- Shadow changes: Vendor model updates without model pinning & fitness checks.
Quick Win Checklist
- Introduce a set manifest and tag today’s prod as
ai-suite@YYYY.MM.DD
. - Namespace caches by set tag; add TTLs by use-case risk.
- Require eval pass for promotion; store scorecard alongside release notes.
- Pin vendor models (ID + date); alert on upstream weight changes.
Closing
Versioning AI as a set makes behavior reproducible, audits simple, and rollbacks instant. That’s how you ship faster without surprises.