Summary

When AI informs decisions that ship to the public, close borders, or ration scarce vaccines, governance is not paperwork. It is the difference between a defensible call and a catastrophic one. This page covers cross-border data sharing, privacy in population surveillance, model reliability standards for public-health decisions, equity safeguards, and alignment with WHO and International Health Regulations. It gives public-health leaders a concrete framework for approval gates, audit trails, and the reliability thresholds an AI recommendation must clear before it can move a policy lever during an outbreak.

Context

High-stakes decisions demand governed, not black-box, AI

The revised International Health Regulations, agreed by 196 states, and the pandemic agreement adopted by the World Health Assembly in May 2025, both lean on rapid data sharing and pathogen surveillance. AI now sits in the middle of that pipeline, yet most national frameworks were written before machine learning entered the surveillance stack. During COVID-19, contact-tracing apps in several countries collected location data with unclear retention rules, and public trust cratered as a result. One European app was withdrawn within months after privacy regulators objected. Governance failures do not just create legal risk. They destroy the public cooperation that surveillance depends on.

The stakes are asymmetric. An AI model that recommends closing a border, quarantining a region, or prioritizing one population for a scarce vaccine can be wrong in ways that cost lives and legitimacy at once. A 2019 study found a widely used clinical algorithm systematically under-referred Black patients for care because it used cost as a proxy for need. The same failure mode in a pandemic triage tool would ration ventilators along the lines of existing inequity. The World Health Assembly agreement obliges parties to share pathogen data and benefits, which raises the stakes further: an AI recommendation built on shared genomic data now carries cross-border consequences, and a poorly governed model can strain the trust that data-sharing depends on. Governance is how you catch all of that before it ships, not after, and how you keep the cooperation flowing when the next threat arrives.

The framework

Five control domains for public-health AI

Every AI system that informs a consequential public-health decision should be assessed against five domains before it goes live, and re-assessed on a fixed cadence thereafter. Treat the five as a gate, not a checklist to file away: a system that fails any one domain does not deploy until the gap is closed, because in public health the weakest control is the one an outbreak will find.

Control domainCore requirementOwner
Data sharingCross-border transfers meet IHR terms with documented legal basis and minimizationData protection lead plus legal
Surveillance privacyPurpose limitation, defined retention, aggregation or anonymization by defaultPrivacy officer
Model reliabilityBacktested accuracy, calibration, and uncertainty bounds meet a set threshold before deploymentChief epidemiologist
EquityPerformance audited across demographic and geographic subgroups, gaps remediatedEquity and ethics board
Approval and auditHuman sign-off gate plus a queryable log of every consequential recommendationResponse director
Recommended actions

Build the guardrails before the outbreak, not during it

  • Stand up a standing review board with epidemiology, ethics, legal, and community representation that can approve or halt any AI system informing policy decisions.
  • Define reliability thresholds in advance: the minimum calibrated accuracy and the maximum uncertainty a model must meet before its output can move a policy lever.
  • Require every consequential recommendation to carry its provenance: source data, model version, assumptions, and confidence, so any decision can be reconstructed later.
  • Write data-sharing agreements now that pre-authorize cross-border transfers under IHR terms, so legal review does not become the bottleneck during a fast-moving event.
  • Mandate subgroup equity audits before launch and on a quarterly cadence, with a documented remediation path when gaps appear.
Common pitfalls

How governance goes wrong under pressure

  • Suspending review during an emergency, exactly when the highest-stakes AI decisions are being made and oversight matters most.
  • Collecting surveillance data with no retention limit or purpose boundary, torching public trust and future cooperation.
  • Accepting a model recommendation with no confidence interval or explanation, leaving decision-makers unable to weigh it against other evidence.
  • Skipping equity audits and discovering only after deployment that the model underserves the populations already hit hardest.
Metrics that matter

Measure whether governance actually holds

  • Approval coverage: share of deployed AI systems that passed the full five-domain review before going live.
  • Audit completeness: percentage of consequential recommendations with full provenance recorded and reconstructable.
  • Equity gap: maximum performance difference across demographic and geographic subgroups, tracked and driven down over time.
  • Time to reconstruct: median hours to fully explain any past AI-informed decision on request from an oversight body.
FAQ

Frequently asked questions

Should we relax governance during an active outbreak to move faster?

No. The emergency is precisely when the highest-stakes AI decisions get made, so it is when oversight matters most. Instead of relaxing controls, build fast-path governance ahead of time: pre-authorized data agreements, pre-set reliability thresholds, and a standing board that can convene in hours. Speed and oversight are not a trade-off if you prepare the guardrails before the event.

How do we handle cross-border data sharing without violating privacy law?

Draft data-sharing agreements in advance that specify the legal basis, the minimization and retention rules, and the IHR provisions they operate under. Default to aggregated or anonymized data for cross-border transfer, keep identifiable data local, and document the purpose limitation. Pre-negotiated terms keep legal review from becoming the bottleneck when speed matters.

What reliability standard should a model meet before informing policy?

Set it in advance and in numbers: a minimum calibrated accuracy on backtested historical outbreaks, an explicit uncertainty bound, and evidence of stable performance across subgroups. A model that cannot state its confidence, or that degrades on populations already at risk, does not meet the bar to move a policy lever, however impressive its headline accuracy.