The AI Program That Looked Like It Was Working Until It Wasn't

RAND Corporation's analysis of more than 2,400 enterprise AI initiatives found that 80 percent fail to deliver their intended business value. That is roughly twice the failure rate of non-AI technology projects. MIT's NANDA study found that 95 percent of enterprise generative AI pilots produced zero measurable return on the P&L. In 2025, global enterprises invested $684 billion in AI. By year-end, more than $547 billion of that investment had produced no measurable results.

Most enterprise AI leaders know these statistics. What most do not have is a reliable method for identifying, early enough to intervene, whether their specific program is on the path toward the 80 percent or the 20 percent. The challenge is that AI programs in their early stages look similar regardless of which path they are on. Pilots produce impressive demos. Early adoption metrics look strong. Executive enthusiasm is high. The warning signs that predict eventual failure are present in both cases, but they are easy to misread as temporary implementation challenges rather than structural problems.

This post is about the warning signs that appear 6 to 18 months into deployment, before failure becomes undeniable and before the sunk cost has compounded to the point where honest assessment is politically difficult. Folio3's analysis of 140 enterprise AI implementations found that only 23 percent of failures were caused by model performance or integration complexity. The remaining 77 percent came from strategy, governance, and change management. Those are problems that show detectable signals well before the failure is complete, and those signals are what this post addresses.

Warning Sign One: The Metrics Are Activity, Not Outcomes

The most reliable early indicator of an AI program that is not on track to deliver business value is a measurement framework built around activity metrics rather than outcome metrics. User adoption rates, queries processed, tasks completed, satisfaction scores, and training completion rates are all activity metrics. They measure whether the AI is being used. They do not measure whether the AI is producing the business outcomes the investment was supposed to generate.

Programs that measure activity instead of outcomes are not measuring activity by accident. They are measuring activity because outcomes are harder to measure, take longer to materialize, and require connection to financial data that the AI team does not control. Activity metrics are available immediately, they trend upward as deployment scales, and they look like progress in executive updates. Outcome metrics require baseline measurement before deployment, attribution methodology to connect AI activity to financial results, and patience through the lag between AI adoption and measurable business impact.

The diagnostic question is specific: can the team name the P&L line, the cost centre, or the revenue stream where the AI program's impact will appear, and have they established the pre-deployment baseline that would allow them to measure the change? If the answer is no to either part of that question, the program is measuring activity rather than outcomes by design, and the executive update cadence is producing confidence in a program whose value is unverifiable.

Gartner's April 2026 survey of 782 I&O leaders found that 57 percent of organizations that experienced AI failure attributed it to expecting too much, too fast. That expectation gap almost always produces pressure to demonstrate progress through activity metrics when outcome metrics are not yet available, and once the program is being measured on activity, it is very difficult to shift the measurement framework before the program is already in trouble. The AI ROI measurement framework described in this series addresses this directly: the measurement infrastructure needs to be designed before deployment, not built retroactively when the board asks for evidence.

Warning Sign Two: The Sponsor Is No Longer in the Room

Executive sponsorship evaporates within six months in 56 percent of failed AI initiatives, according to Folio3's analysis. This pattern is so consistent that it is worth treating as a structural prediction: when the executive sponsor's visible engagement with the AI program drops, the program's probability of success drops with it.

The mechanism is straightforward. AI programs require organizational cooperation from functions and teams that have no direct accountability for the AI program's success. The data governance changes that the AI program requires are owned by the data governance function. The workflow changes that make the AI's outputs actionable are owned by the business function. The security reviews that unlock production deployment are owned by the security team. Each of these teams has its own priorities, its own backlog, and its own resource constraints. Their cooperation with the AI program is a discretionary contribution that they make when it is sponsored visibly by leadership they respect, and that they defer when it is not.

When the executive sponsor moves on to the next priority, the AI program continues in the governance documents and the project plans, but the organizational cooperation that enables progress quietly reduces. The data governance changes get deprioritized. The workflow redesign gets deferred. The security review takes longer than expected. The program appears to be progressing because nobody has formally declared it stalled, but the rate of meaningful progress has slowed to the point where the original timeline is no longer realistic.

The intervention point for this warning sign is the moment when the sponsor's calendar availability decreases, not after the program has visibly stalled. A direct conversation about whether the strategic priority has shifted, and what the program needs from sponsorship to maintain momentum, is organizationally uncomfortable but far less costly than allowing the program to drift for six months before anyone acknowledges that the organizational energy behind it has dissipated.

Warning Sign Three: The Business Process Has Not Changed

McKinsey's 2025 AI survey found that organizations reporting significant financial returns were twice as likely to have redesigned end-to-end workflows before selecting modelling techniques. The business process came first. The model came second. This finding, consistent across multiple research sources, identifies the most common structural failure mode in enterprise AI programs: the AI is deployed to assist with a process that was designed for human execution, without redesigning the process to take advantage of what the AI makes possible.

When a process has not been redesigned around the AI's capabilities, the AI produces outputs that employees verify manually, override frequently, and route around when the friction exceeds the benefit. The AI adoption rate is high because employees are using the tool. The productivity improvement is low because the process is not designed to eliminate the manual steps that the AI was supposed to make unnecessary.

The diagnostic question is whether the process map for the workflow the AI supports changed when the AI was deployed. If the process map looks the same as before deployment, except with an AI tool inserted at a specific step, the workflow redesign that converts AI adoption into productivity improvement has not happened. The AI is being used as a faster input to an unchanged process rather than as the basis for a fundamentally different process design.

Six months into production deployment is the right time to conduct this assessment because it is late enough that real usage patterns are visible but early enough that a workflow redesign is still practically achievable. The leaders of AI programs that reached this assessment point twelve months late consistently report that the political and organizational difficulty of redesigning a workflow that employees have already adapted to is significantly higher than redesigning it before those adaptations have calcified into the new normal.

Warning Sign Four: The Program Has Become Infrastructure

Enterprise AI programs that survive their early stage often transition, without anyone explicitly deciding to make this transition, from a business transformation program to an infrastructure program. The KPIs shift from business outcomes to system reliability. The team's energy moves from measuring impact to managing uptime. The executive conversations move from return on investment to operational performance. The program is still running. It is no longer progressing.

This transition happens when the initial business case has been partially validated, when the production system is stable enough that operational management has become the primary concern, and when the organizational energy required to drive the workflow redesign and adoption work has not been sustained. The infrastructure framing is comfortable because it positions the AI system as a success: it is live, it is stable, it is being used. The business value framing is uncomfortable because it requires confronting the gap between what the program was supposed to deliver and what it has actually delivered.

Programs that have made this transition without having closed the business value gap require a specific intervention: a re-scoping exercise that returns the program to a business transformation frame, with specific outcome commitments for the next 90 days and explicit accountability for the workflow changes and adoption work required to meet those commitments. The alternative is allowing the program to continue as infrastructure, at ongoing operational cost, without ever producing the business case that justified the investment. That outcome is the most common form of the 80 percent failure, not dramatic project cancellation but quiet persistence of a program that is no longer trying to deliver what it was funded to deliver.

Warning Sign Five: The Governance Is Documentation, Not Control

Virtana's March 2026 report found that while 59 percent of executives believe their organizations are prepared for AI-scale operations, 62 percent of practitioners report fragmented systems and persistent visibility gaps. That gap between executive confidence and practitioner experience is one of the most reliable indicators of governance that exists as documentation rather than as operational control.

Governance documentation describes the policies, the approval processes, the audit requirements, and the risk classifications that the organization has committed to applying to its AI systems. Operational governance is the actual application of those policies to actual systems in production. The two are not the same, and the gap between them is where the organizational and reputational risk of an AI program concentrates.

Programs with documentation governance rather than operational governance are performing Algorithmic Impact Assessments that are filed but not acted on. They are classifying AI systems as high-risk in policy documents but not applying the human oversight and audit trail requirements that high-risk classification is supposed to trigger. They are publishing AI ethics principles that are not connected to the design decisions of any specific system. The governance infrastructure looks complete in the policy documents and looks absent in the production systems.

The intervention at this warning sign is an operational governance audit: not a review of the policy documentation, but a review of whether each specific policy requirement is actually implemented in each specific production system. That audit will consistently reveal gaps between what the policy requires and what the system does. Those gaps are the risk exposure that the governance documentation was supposed to prevent, and they are most efficiently closed before a regulatory review, a customer complaint, or a system failure makes them visible to audiences outside the program team.

What Programs That Survive Look Like

The 19.7 percent of enterprise AI initiatives that deliver intended business value share identifiable characteristics that are visible in program design, not just in retrospective analysis. Pertama Partners' synthesis of RAND, MIT, McKinsey, Deloitte, and Gartner data identifies the design factors that predict success: clear pre-approval metrics produce 54 percent success versus 12 percent without. Sustained sponsorship produces 68 percent success versus 11 percent without. Programs treated as business transformation rather than technology projects produce 61 percent success versus 18 percent for technology-framed programs.

These are not complicated insights. They are the same insights that have been available to enterprise AI practitioners for three years. The reason they have not eliminated the 80 percent failure rate is not that organizations are unaware of them. It is that the organizational conditions required to act on them, the willingness to delay deployment until baselines are established, the sustained executive engagement through the difficult middle phase of a transformation program, the discipline to treat workflow redesign as a prerequisite rather than a follow-on, are genuinely difficult to maintain against the competitive pressure to deploy AI quickly and the organizational pressure to demonstrate progress through activity rather than outcomes.

The AI pilot to production framework described in this series provides the structural mechanism for maintaining these conditions through the program lifecycle. The AI portfolio triage framework provides the mechanism for identifying which programs are on the wrong path before the sunk cost makes honest assessment politically difficult. Together they represent the program management discipline that separates the 20 percent from the 80 percent, not as a guarantee, but as the organizational infrastructure that makes the right outcomes more likely than the wrong ones.

Talk to Us

ClarityArc helps organizations assess whether their AI programs are producing the business outcomes they were designed to deliver, identify the specific warning signs that predict failure before it becomes undeniable, and redesign program governance and measurement frameworks to close the gap between AI activity and AI value. If your AI program has been running for more than six months and you are not certain whether it is on the right track, we are ready to help you find out.

Get in Touch
Previous
Previous

The Technology Investment That the Board Approved and the Business Ignored

Next
Next

The Data Strategy for Organizations That Have Tried Before