Summary

A practical scoring model for CFOs to tie AI investments to unit economics, control requirements, and risk management.

Context

CFOs now own a portfolio of AI bets they did not underwrite the way they underwrite capital projects. Pilots multiply, vendors promise, and business units lobby, but few of those bets carry a defensible number attached to impact, feasibility, or downside. The result is a portfolio that skews toward whatever is loudest rather than whatever compounds. A scoring model is the finance function's answer: it converts enthusiasm into a ranked, fundable, auditable pipeline, and it gives the CFO a language to say no without saying "not now" to everything.

The model below is deliberately simple enough to run in a spreadsheet before your next capital review, yet structured enough to survive audit and board scrutiny. It rests on three weighted dimensions (Impact, Feasibility, and Risk), each broken into sub-metrics that reviewers can score consistently.

The Three Dimensions

Impact, will it move the P&L?

Impact measures the size and durability of the financial outcome: incremental revenue, margin expansion, cost-to-serve reduction, or working-capital release. Score each use-case on expected annual value, confidence in that value, and how directly the benefit lands in a line the CFO reports. A chatbot that deflects 12% of tickets scores differently from a pricing engine that lifts gross margin 90 basis points, and the model should make that difference visible.

Feasibility, can we actually ship it?

Feasibility captures technical readiness, data availability and quality, integration burden, and the organization's capacity to adopt the change. The most common failure is a high-impact use-case that dies in production because the data contract never existed or the frontline never changed its workflow. Score data readiness, model maturity, and change-management load separately so a single weak link is not hidden by two strong ones.

Risk, what could it cost us if it goes wrong?

Risk spans regulatory exposure, compliance complexity, operational fragility, and reputational sensitivity. For regulated industries this dimension often carries the heaviest weight, because a model that saves $2M and triggers a $20M enforcement action is not a saving. Score the severity and likelihood of adverse outcomes, and whether controls exist to detect and remediate them.

The Scoring Rubric

Score each sub-metric 1–5, apply the dimension weights, and produce a single composite per use-case. A worked rubric:

DimensionSub-metrics (score 1–5 each)Default weight
ImpactAnnual value · Confidence in value · Directness to reported line40%
FeasibilityData readiness · Model/tech maturity · Change-management load35%
Risk (inverted)Regulatory exposure · Operational fragility · Reputational sensitivity25%

Risk is scored as a penalty: a high-risk use-case subtracts from the composite, so the model rewards value that can be delivered safely rather than value alone.

A Portfolio in Practice

Applied to four candidate use-cases at a mid-market lender, the model ranks them clearly and redirects capital away from the loudest project toward the one that compounds:

Use-caseImpactFeasibilityRisk (penalty)CompositeCall
Collections prioritization model4.54.0Low4.2Fund now
Document-intake automation3.54.5Low3.9Fund now
Generative customer advisor4.02.5High2.6Stage / de-risk
Marketing content generator2.54.0Medium2.8Defer

The generative advisor was the executive team's favorite going in; the model does not kill it, but it moves it behind a data-and-controls gate rather than funding it at full pace. That is the point: the rubric makes the trade-off explicit and reviewable.

Weighting by Context

The 40/35/25 default is a starting point, not doctrine. A growth-stage business chasing share may push Impact to 50%. A bank, insurer, or healthcare provider will often raise Risk to 35% or more, because the asymmetry of downside dominates. Set the weights once, at the portfolio level, with the audit committee's awareness, then hold them constant across the cycle so scores stay comparable.

Governance & Cadence

  • Calibrate reviewers on three sample use-cases before scoring the live pipeline, so a "4" means the same thing across scorers.
  • Require two independent scores per use-case and reconcile gaps above one point. Subjective single-scorer rubrics are where credibility dies.
  • Refresh scores quarterly and at every capital gate; data quality, regulation, and unit costs all move.
  • Log the score, the weights, and the reviewers with each funding decision so the portfolio is auditable a year later.

Common Pitfalls

  • Overweighting feasibility and quietly starving high-impact but harder opportunities.
  • Single-scorer subjectivity with no calibration or reconciliation.
  • Letting scores go stale as costs fall, regulations tighten, or data lands.
  • Treating the composite as the decision rather than as the input to a decision reviewers still own.

Quick-Win Checklist

  • Publish a one-page rubric and circulate it to reviewers before the next capital review.
  • Score the top 10 pipeline use-cases and rank them by composite.
  • Map the ranked list to funding tranches and decision rights.
  • Book a quarterly recalibration into the finance calendar.

Closing

A balanced scoring model does not make AI decisions for the CFO; it makes them defensible. Capital flows to the initiatives with the best mix of return, viability, and acceptable risk; the loud-but-fragile projects get staged rather than starved; and the whole portfolio becomes something the CFO can explain to a board and an auditor in the same breath. That is disciplined innovation at scale.

Scaling the model across the portfolio

A scoring rubric that works for ten use-cases has to survive being applied to a hundred, across business units that do not share vocabulary or risk appetite. The way to scale it is to keep the dimensions and weights fixed at the enterprise level while letting each unit populate the sub-metrics with its own evidence. That preserves comparability, so a score of four means the same thing in operations as it does in finance, without forcing every team into an identical template that fits none of them well.

As volume grows, the reviewer calibration step becomes the control that matters most. Rotate a small central panel through each unit's scoring sessions, so drift is caught early and the rubric stays honest. Publish the ranked portfolio openly, because transparency is what stops the scoring model from being quietly overridden by whoever lobbies hardest. A rubric that everyone can see, and that produces the same answer regardless of who runs it, is what turns a spreadsheet into governance.