AI in the enterprise is stuck. Not in some distant, speculative future, but right here, right now.

For all the promises and potential of AI, the reality for enterprises today is far less inspiring. AI remains fundamentally unreliable, fragile, and incapable of delivering on its boldest promises. The reliability and trust deficits of modern AI have left enterprises frustrated and disillusioned. AI pilots, full of promise in their early stages, often fail to make it to production. Systems touted as revolutionary falter when exposed to the messy, unpredictable realities of live data. The stakes are highest in industries like healthcare, banking, and manufacturing—fields where accuracy isn’t a luxury, but a mandate. 

At the core of the AI adoption gap is a harsh reality – builders in the enterprise lack a scalable way to build trustworthy and reliable AI systems. Large language models (LLMs) and Large Reasoning Models (LRMs) are unreliable in their end-to-end performance. And making them perform reliably for a given task requires intensive hard coding that results in rigid and narrow solutions. But what if the enterprise AI narrative could be rewritten? What if we could stop chasing potential and start guaranteeing performance?

Ori Goshen and Yoav Shoham - AI21's co-founders

The Problem with Current Technology

Enterprise builders’ struggle to build reliable solutions at scale. Especially for complex, multi-step, multi-tool workflows. Builders building these workflows face two deeply flawed options:

  • “Prompt & Pray”
    This approach relies on LLMs and LRMs to perform as instructed, throwing open-ended tasks at them and hoping for the best. While flexible, this method lacks control, reliability, and accountability. It is vulnerable to the errors of probabilistic models which intensifies in large and unique action spaces and compounds with the number of steps. It’s essentially rolling the dice in mission-critical scenarios—a risk no serious organization can afford.
  • Hard-Coded Chains
    To escape “prompt & pray”, developers build static programs that dictate every step of a multi-step process and implement validation and error recovery techniques. While this approach offers more predictability, it’s rigid, narrow and labor-intensive. It also becomes brittle under changing conditions, forcing developers into an endless cycle of re-engineering.

Recent advancements in AI don’t improve the situation. Large Reasoning Models have shown improved performance in multi-step reasoning tasks through Chain-of-Thought reasoning (CoT) trained in Reinforcement Learning (RL). But early experimentation with LRMs in the enterprise already shows that they fail to escape the realm of “prompt & pray”. Their performance is inconsistent, providing different results on different attempts, they fail to reliably adhere to instructions, and they struggle to operate tools. These results are not surprising. Sequential generation of thinking tokens is a poor way to search the space of alternative solutions, especially when the action space is large and environment-specific.

Existing agentic frameworks don’t solve these issues. Instead, they provide a thin engineering layer that allows merely stitching together LLMs, business logic and data, leaving developers to choose between two inadequate solutions: unpredictable AI or rigid, narrow systems.

Here’s the hard truth: Current AI systems are fundamentally broken. “Prompt and pray” isn’t a strategy—it’s a gamble. Static chaining isn’t a robust solution—it’s a workaround. Neither approach addresses the core issue: the inherent unreliability of LLMs and LRMs in dynamic, high-stakes workflows.

A New Contract with AI: Guaranteed AI Performance

What if AI didn’t just work sometimes? What if it worked every time?

Imagine specifying exactly what you need from an AI system— instructions, constraints, cost limits—and knowing, with absolute certainty, that it will deliver.

This isn’t wishful thinking. It’s the reality we’re building with a singular focus: Guaranteed AI Performance—a new paradigm where AI systems deliver consistent, reliable outcomes.

At the core of this transformation is the concept of Quality Level Agreements (QLAs)—a new contract with AI. Instead of prompt engineering LLMs, builders will be able to explicitly specify the instructions and constraints they care about and enjoy AI performance that meets all requirements with the following features:

  • Predictability —know exactly what you’re getting from your AI systems, every time.
  • Transparency—full visibility into how decisions are made, building trust in AI outcomes.
  • Cost Control—performance is delivered within defined budget constraints, eliminating financial uncertainty.

With QLAs, AI is no longer a black box—it’s a trusted machine that operates with the same rigor and accountability as any other critical infrastructure.

Planning Technology: The Next Frontier

To move beyond unreliable AI, we need a fundamental shift. AI must do more than merely ‘think’ —it must plan and execute.

What is planning?
Planning is the ability to model an action space, assess the expected costs and value of alternative paths to an outcome and choose the best paths to pursue under real-world constraints like cost and time. It happens at two levels:

  • Macro-planning – Breaking down a task into multiple dependent and independent steps and defining their required outcomes. 
  • Micro-planning – Ensuring each step’s result is accurate and reliably satisfies requirements by dynamically choosing the right models and tools, scaling inference-time compute and running validations. Without micro-planning, errors compound across steps and break even the best macro plans.

We’ll have more to share about this soon. Stay tuned!

Redefining What’s Possible

The emergence of this technology marks a new chapter in AI. It’s the first step in a larger mission to rebuild the foundation of AI itself, enabling enterprises to not just use AI, but to be powered by it.

Imagine a world where businesses use AI to generate accurate, context-aware contracts, researchers trust AI to summarize complex studies without errors, and customer support chatbots provide accurate, nuanced responses—automatically, reliably, and at scale.

In order for companies to be truly powered by AI, there has to be a fundamental shift in how AI is designed, deployed, and trusted. It requires moving beyond today’s limitations to create systems that are as reliable as they are powerful, as precise as they are scalable.

It requires moving beyond the flawed approaches of the past and embracing a new paradigm: accuracy at scale.

The journey to revolutionize AI starts now. Let’s build it together.

Ori & Yoav