Overview
ROI = (benefit − cost) ÷ cost, but the hard part is attribution and time. Treat benefits and costs as dated cash flows; prove causality with baselines and controls; and show the sensitivity of the result to the key assumptions. Keep one logic path for operations and finance.
Value model (what to count)
Labor & throughput
- Hours removed, redeployed, or avoided
- Cycle time and SLA gains → more capacity
Quality
- Defect and rework reduction
- Duplicate, mismatch, or leakage prevented
Revenue & service
- Faster quotes, fewer abandons, more on-time commits
- Better CSAT/retention where speed matters
Compliance & audit
- Late approvals and failed reconciliations reduced
- Audit prep hours saved; findings avoided
Resilience
- After-hours processing and surge handling
- Fewer single points of failure
How to price benefits
- Labor: loaded rate × hours
- Errors: cost per defect × delta
- Time: value of latency reduction or capacity gain
Cost model (what to include)
Build
- Analysis, design, development, testing
- Platform setup, licenses, compliance reviews
- Change management and training
Run & maintain
- Bot/worker minutes, VMs/runners, API calls, storage
- Monitoring, on-call, incident response
- Upgrades, selector fixes, model re-training
Hidden costs
- Shadow spreadsheets and duplicate logic
- Test data creation and non-prod environments
- Unplanned downtime and rework
Measurement design
Baseline & controls
- 4–12 weeks of baseline (stable seasonality)
- Control group or holdout corridor
- Define start/stop events and operational definitions
Data & instrumentation
- Event logs: case id, activity, timestamp, actor
- Same data feeds ops and finance (one truth)
- SPC to separate signal from noise
Design tips
- Use median and p90, not only averages
- Segment by product/channel/region to avoid dilution
- Track backlog aging to expose queue effects
Attribution & causality
Methods
- Before/After with control group
- Difference-in-differences (DiD)
- SPC control charts (common vs special cause)
Confounders
- Seasonality, mix, concurrent changes
- Simpson’s paradox across segments
Proof package
- Assumptions, definitions, and data sources
- Plots: baseline vs pilot vs control
- Sensitivity to key assumptions
Cash flow, NPV & IRR
Cash flows
- Lay out dated inflows (benefits) and outflows (costs)
- Include run/maintain; avoid one-off “savings only” claims
Finance metrics
- NPV: discounted net cash flow
- IRR: discount rate where NPV = 0
- Payback: time to break even (undiscounted and discounted)
Sensitivity & scenarios
- Vary wage rates, volume, exception rate, uptime
- Best/base/worst with probabilities
- Show tornado chart of drivers
Pilot vs. scale economics
Scale curves
- Licenses amortize; monitoring and SRE add fixed cost
- Exception tails reduce incremental value
Readiness gates
- Hit p90 cycle-time target and FPY threshold first
- Runbooks, on-call, and rollback in place
Capacity plan
- Bot/worker minutes, queues, and peak load
- Back-pressure and graceful degradation
Risk-adjusted ROI
Control posture
- KCIs: late approvals, failed reconciliations, access exceptions
- Audit findings closed and time to close
Risk valuation
- Expected loss avoided (probability × impact)
- Penalty and service-credit avoidance
Model/AI risks
- Override rate, safety flags, hallucination incidents
- NIST AI RMF controls; approvals for high-impact steps
Portfolio & sequencing
Prioritization
- Benefit ÷ effort with risk and readiness gates
- Marginal ROI (MoAR) by adding the next candidate
Constraints
- Licenses, SRE capacity, change windows
- Data and API readiness per corridor
Real options
- Stage work to keep options open
- Kill or pivot low-yield pilots early
Reporting & dashboards
Ops
Cycle time (median/p90), FPY, backlog aging, exception rate.
Finance
Run-rate savings, one-time costs, NPV/IRR, payback.
Control health
KCIs, audit issues, evidence completeness, override rate (AI).
Use the same operational definitions and sources across boards. No re-calculated “slide math.”
Pitfalls
Savings without timestamps
Claimed hours without dated evidence do not count. Keep event logs and payroll/volume links.
No control group
Use a holdout or corridor. Show before/after with control, not only before/after.
Ignoring run/maintain
Include bot minutes, fixes, upgrades, monitoring, and model re-training.
90-day starter
Days 0–30
- Pick one flow; define KPIs/KCIs and operational definitions
- Collect 8–12 weeks of baseline; identify control group
Days 31–60
- Pilot automation; track cycle time, FPY, exceptions
- Draft cash-flow model; add run/maintain estimates
Days 61–90
- Publish deltas with DiD/SPC; compute NPV/IRR/payback
- Run sensitivity; set scale gates and governance
References
- NIST e-Handbook of Statistical Methods — nist.gov
- MIT OCW: Little’s Law (queueing) — ocw.mit.edu
- Lean Enterprise Institute: Value-stream mapping — lean.org
- OpenTelemetry (observability) — opentelemetry.io
- Google SRE: SLOs and error budgets — sre.google
- Forrester TEI (cost/benefit framework) — forrester.com
Prove value with dated cash flows and clean evidence.
If you want an ROI workbook (value/cost templates, DiD/SPC examples), ask for a copy.