From 200 AI Pilots to 5 That Pay: A Portfolio Triage Method

AI Strategy

May 8

The number that should be on every executive's desk right now is not their AI investment figure. It is their AI conversion rate.

According to MIT's NANDA Initiative, which studied over 300 enterprise AI deployments and interviewed 150 executives, 95 percent of enterprise generative AI pilots deliver zero measurable P&L impact. Not low returns. Zero. A separate March 2026 survey of 650 enterprise technology leaders found that 78 percent of organizations have at least one AI agent pilot running, but only 14 percent have successfully scaled any agent to organization-wide operational use. The average sunk cost per abandoned AI initiative reached $7.2 million in 2025, according to research tracking enterprise outcomes.

The problem is not that AI does not work. The problem is that organizations have optimized for launching pilots and not for the conditions that allow pilots to become operational systems that change financial outcomes. The result is a portfolio that grows in breadth every quarter while the number of initiatives generating measurable business value stays nearly flat.

Triage is the correct response to this situation. Not more pilots. Not a new framework. A rigorous method for separating the use cases worth scaling from the ones worth stopping, applied to the portfolio you already have.

Why Pilots Proliferate and Value Does Not

Understanding why AI portfolios grow without producing proportional returns is necessary before designing a triage method, because the structural causes of the problem shape what a solution needs to address.

The first cause is that pilots are engineered for success in controlled conditions. A pilot team curates the test data, selects the evaluation criteria, chooses the users most likely to adopt, and manages the edge cases. The results look impressive because the conditions were designed to produce impressive results. When the same system encounters the full complexity of production, where data is messy, users are resistant, edge cases are frequent, and the workflow the system is supposed to augment was designed for manual work, the performance degrades rapidly.

The second cause is that 73 percent of failed AI projects had no agreed definition of success before the project started, according to a 2025 MIT Sloan study. Projects with quantified success metrics defined upfront achieve a 54 percent success rate. Those without achieve 12 percent. The difference is not in the quality of the AI. It is in whether anyone established, before the work began, what would constitute a meaningful result.

The third cause is misalignment between where AI resources are being spent and where the returns actually are. MIT's research found that more than half of generative AI budgets are devoted to sales and marketing tools, yet the biggest measured ROI comes from back-office automation: eliminating manual processing, cutting external agency costs, and streamlining operational workflows. Organizations are funding pilots in the categories they find exciting rather than the categories where the evidence says value accumulates.

The fourth cause is what researchers from Harvard Business School, Microsoft, and Harvard's Digital Data and Design Institute, writing in Harvard Business Review in March 2026, describe as the last-mile problem. Individual users can demonstrate real productivity gains from AI tools without those gains aggregating to team or business-unit financial outcomes. A coordinator who cuts a task time by 40 percent creates no measurable EBIT impact if the surrounding workflow has not changed. Individual wins need process-level redesign to become financial returns. They rarely get it without deliberate design.

The Triage Framework

Portfolio triage applies four filters to every AI initiative in the current portfolio. Each filter is a binary question. An initiative that fails any filter is a candidate for stopping or restructuring before additional resources are committed. An initiative that passes all four is a candidate for scaling.

Filter One: Is There a Quantified Business Outcome?

Not an expected outcome. Not a projected outcome. A quantified, specific, measurable business outcome that a named executive owns and has agreed to be accountable for. The outcome must be expressed in terms that connect to financial performance: cost reduction in a specific function by a specific amount, cycle time reduction that translates to revenue capacity, error rate reduction that eliminates a category of rework cost, or headcount reallocation that reduces a hiring plan.

Outcomes expressed in technical terms do not pass this filter. Model accuracy, inference speed, user satisfaction scores, and adoption rates are indicators, not outcomes. They are useful for managing the system. They are not useful for justifying the investment to a CFO or a board.

Deloitte's research on AI ROI found that organizations with business-aligned success metrics achieve AI payback in under two years. Those measuring primarily technical performance take two to four years. The gap is not in the technology. It is in whether the initiative was connected to a business outcome from the beginning.

Any initiative in the current portfolio that cannot identify its quantified business outcome in one sentence should be paused until that sentence exists and is endorsed by the relevant business owner. Continuing to invest in a pilot without a defined outcome is not experimentation. It is spending without accountability.

Filter Two: Is the Data Ready?

Gartner predicts that 60 percent of AI projects unsupported by AI-ready data will be abandoned through 2026. The data readiness question is not whether data exists. It is whether the specific data the initiative requires is accessible, sufficiently complete and accurate, properly governed, and structured in a way the AI system can actually use.

In most enterprise environments the honest answer to this question is no, at least for some portion of what the initiative requires. The question is whether the gap is closeable within the initiative's timeline and budget, or whether the initiative is fundamentally blocked by a data problem that is not its own to solve.

An initiative blocked by a data infrastructure problem that requires six months and a separate team to resolve is not an AI problem. It is a data strategy problem that is consuming AI budget while producing AI pilot metrics. It should be classified as a data initiative with a future AI component, funded accordingly, and removed from the AI portfolio count until the prerequisite work is done.

Filter Three: Has the Workflow Been Redesigned?

McKinsey's 2025 research found that organizations that redesign their workflows before selecting AI tools are twice as likely to report significant financial returns. This finding challenges the typical sequencing of AI deployment, where the technology is selected and deployed first and the process is adjusted around it afterward.

The reason workflow redesign matters is precisely the last-mile problem described above. AI layered on top of a workflow designed for manual work captures a fraction of the available value. The workflow was built around the constraints of human execution: the batch sizes, the handoff points, the exception-handling procedures, the reporting rhythms. When AI removes some of those constraints, the workflow can be reconceived, not just accelerated. The organizations that reconfigure the process to take full advantage of what AI makes possible consistently outperform those that deploy AI into unchanged processes.

For each initiative in the triage, the question is: has anyone sat down with the business function this initiative serves and redesigned the workflow it will operate within? Not adjusted the workflow. Redesigned it, starting from what becomes possible when the constraint the AI addresses no longer exists. If that conversation has not happened, the initiative is at significant risk of delivering individual productivity gains that do not aggregate to business-level financial returns.

Filter Four: Is There a Named Owner in the Business?

Deloitte's 2026 State of AI report found that only 21 percent of organizations have a mature governance model for AI systems. The most common governance failure is the absence of a named business owner for each production system, someone whose job depends on the system performing well over time, who is accountable for monitoring outputs, flagging degradation, and managing the system as business conditions change.

Without a named owner, systems that work well in month one degrade quietly. The model was trained on data that is now stale. The business process changed and nobody updated the system logic. An edge case emerged that nobody anticipated and no exception-handling procedure was defined. The system continues to run. The outputs get progressively less reliable. The business function stops trusting the output and reverts to manual processes. The initiative is counted as a deployment in the portfolio metrics and a failure in the actual operations of the function it was supposed to serve.

The naming question is a proxy for organizational seriousness. An initiative for which no business leader is willing to accept ongoing ownership is an initiative that the business does not actually believe will matter. That is useful information. It suggests either that the initiative needs to be better connected to a business problem the relevant leader actually owns, or that it should not be in the portfolio.

Applying the Triage: What Typically Happens

When organizations apply these four filters honestly to their current AI portfolio, a consistent pattern emerges. A significant portion of initiatives, often 40 to 60 percent, fail Filter One because they have no quantified business outcome. A further portion fail Filter Two because the data they require is not AI-ready and the gap is not closeable within the current initiative scope. The initiatives that survive both filters are typically a small fraction of the starting portfolio, and that fraction is where the scaling investment should be concentrated.

This outcome is not a failure. It is a clarification. The initiatives that survive triage are the ones with the structural conditions for value creation. Concentrating resources on them rather than spreading investment evenly across a large portfolio is consistently associated with faster time to measurable return and higher return per dollar invested.

The initiatives that fail triage are not necessarily wrong ideas. Some of them represent genuine opportunities that require prerequisite work, cleaner data, a redesigned workflow, or a clearer business outcome definition, before the AI component will deliver value. Reclassifying those as prerequisite initiatives with a future AI component is more honest than continuing to run them as AI pilots that are not delivering AI results.

What to Do With the Stopped Initiatives

Stopping an AI initiative is organizationally difficult. The team that built it is invested in it. The executive who sponsored it does not want to acknowledge that it did not work. The organization has announced its AI ambition publicly and stopping initiatives feels like retreating from that ambition.

The framing that works is not stopping. It is redirecting. The initiative is not being cancelled because AI failed. It is being redirected because the organizational conditions for success have not yet been met, and continuing to invest in a pilot that cannot become a production system is not a good use of the AI budget or the team's time. The investment is being redirected toward the initiatives where those conditions exist and toward building the prerequisite conditions for the redirected initiatives to succeed in a future cycle.

That framing is accurate and honest. It is also significantly less politically costly than presenting triage as a finding that certain initiatives were bad ideas.

The Build Before You Scale Principle

The March 2026 survey that found 78 percent of enterprises with pilots and 14 percent at production scale also found something instructive about what distinguished the scalers from the stalled organizations. It was not the size of their AI budget. Successful scalers spent proportionally more on evaluation infrastructure, monitoring tooling, and operational staffing, and proportionally less on model selection and prompt engineering. The difference was not what they were building. It was what they built the operating model around before they tried to scale.

Organizations that scale AI successfully treat each production system like production software: with a named owner, a monitoring process, a defined exception-handling procedure, a review cadence, and a plan for keeping the system current as business conditions change. The investment required to build that operating model is organizational, not technical. It does not require a larger AI budget. It requires redirecting a portion of the existing one from launching more pilots to making the promising ones operationally sustainable.

The organizations that do this consistently generate more value from a smaller portfolio than those that maintain a large portfolio of pilots that never mature into operational systems. The measure of AI program success is not the number of pilots running. It is the number of production systems generating measurable business returns. Triage is how you close the gap between those two numbers.

Talk to Us

ClarityArc's AI strategy practice helps organizations assess their AI portfolios, identify the use cases with real scaling potential, and build the operating model conditions that allow pilots to become production systems. If your AI portfolio is growing faster than your AI returns, we are ready to help.

Get in Touch