Tech

Beyond “Prompt and Pray” – O’Reilly

January 22, 2025

TL;DR:

Enterprise AI groups are discovering that purely agentic approaches (dynamically chaining LLM calls) don’t ship the reliability wanted for manufacturing programs.
The prompt-and-pray mannequin—the place enterprise logic lives totally in prompts—creates programs which are unreliable, inefficient, and not possible to keep up at scale.
A shift towards structured automation, which separates conversational skill from enterprise logic execution, is required for enterprise-grade reliability.
This method delivers substantial advantages: constant execution, decrease prices, higher safety, and programs that may be maintained like conventional software program.

Picture this: The present state of conversational AI is sort of a scene from Hieronymus Bosch’s Garden of Earthly Delights. At first look, it’s mesmerizing—a paradise of potential. AI programs promise seamless conversations, clever brokers, and easy integration. But look carefully and chaos emerges: a false paradise all alongside.

Your firm’s AI assistant confidently tells a buyer it’s processed their pressing withdrawal request—besides it hasn’t, as a result of it misinterpreted the API documentation. Or maybe it cheerfully informs your CEO it’s archived these delicate board paperwork—into totally the fallacious folder. These aren’t hypothetical situations; they’re the each day actuality for organizations betting their operations on the prompt-and-pray method to AI implementation.

Learn quicker. Dig deeper. See farther.

The Evolution of Expectations

For years, the AI world was pushed by scaling legal guidelines: the empirical commentary that bigger fashions and greater datasets led to proportionally higher efficiency. This fueled a perception that merely making fashions larger would resolve deeper points like accuracy, understanding, and reasoning. However, there’s rising consensus that the period of scaling legal guidelines is coming to an finish. Incremental features are more durable to realize, and organizations betting on ever-more-powerful LLMs are starting to see diminishing returns.

Against this backdrop, expectations for conversational AI have skyrocketed. Remember the straightforward chatbots of yesterday? They dealt with fundamental FAQs with preprogrammed responses. Today’s enterprises need AI programs that may:

Navigate advanced workflows throughout a number of departments
Interface with lots of of inside APIs and providers
Handle delicate operations with safety and compliance in thoughts
Scale reliably throughout 1000’s of customers and hundreds of thousands of interactions

However, it’s necessary to carve out what these programs are—and are usually not. When we discuss conversational AI, we’re referring to programs designed to have a dialog, orchestrate workflows, and make choices in actual time. These are programs that have interaction in conversations and combine with APIs however don’t create stand-alone content material like emails, shows, or paperwork. Use circumstances like “write this email for me” and “create a deck for me” fall into content material era, which lies exterior this scope. This distinction is vital as a result of the challenges and options for conversational AI are distinctive to programs that function in an interactive, real-time surroundings.

We’ve been informed 2025 would be the Year of Agents, however on the identical time there’s a rising consensus from the likes of Anthropic, Hugging Face, and different main voices that advanced workflows require extra management than merely trusting an LLM to determine the whole lot out.

The Prompt-and-Pray Problem

The commonplace playbook for a lot of conversational AI implementations at this time seems one thing like this:

Collect related context and documentation
Craft a immediate explaining the duty
Ask the LLM to generate a plan or response
Trust that it really works as supposed

This method—which we name immediate and pray—appears enticing at first. It’s fast to implement and demos nicely. But it harbors critical points that change into obvious at scale:

Unreliability

Every interplay turns into a brand new alternative for error. The identical question can yield totally different outcomes relying on how the mannequin interprets the context that day. When coping with enterprise workflows, this variability is unacceptable.

To get a way of the unreliable nature of the prompt-and-pray method, think about that Hugging Face stories the cutting-edge on perform calling is nicely underneath 90% correct. 90% accuracy for software program will usually be a deal-breaker, however the promise of brokers rests on the power to chain them collectively: Even 5 in a row will fail over 40% of the time!

Inefficiency

Dynamic era of responses and plans is computationally costly. Each interplay requires a number of API calls, token processing, and runtime decision-making. This interprets to increased prices and slower response instances.

Complexity

Debugging these programs is a nightmare. When an LLM doesn’t do what you need, your essential recourse is to alter the enter. But the one solution to know the impression that your change could have is trial and error. When your utility contains many steps, every of which makes use of the output from one LLM name as enter for an additional, you might be left sifting by way of chains of LLM reasoning, attempting to know why the mannequin made sure choices. Development velocity grinds to a halt.

Security

Letting LLMs make runtime choices about enterprise logic creates pointless danger. The OWASP AI Security & Privacy Guide particularly warns in opposition to “Excessive Agency”—giving AI programs an excessive amount of autonomous decision-making energy. Yet many present implementations do precisely that, exposing organizations to potential breaches and unintended outcomes.

A Better Way Forward: Structured Automation

The various isn’t to desert AI’s capabilities however to harness them extra intelligently by way of structured automation. Structured automation is a growth method that separates conversational AI’s pure language understanding from deterministic workflow execution. This means utilizing LLMs to interpret consumer enter and make clear what they need, whereas counting on predefined, testable workflows for vital operations. By separating these issues, structured automation ensures that AI-powered programs are dependable, environment friendly, and maintainable.

This method separates issues which are usually muddled in prompt-and-pray programs:

Understanding what the consumer desires: Use LLMs for his or her energy in understanding, manipulating, and producing pure language
Business logic execution: Rely on predefined, examined workflows for vital operations
State administration: Maintain clear management over system state and transitions

The key precept is straightforward: Generate as soon as, run reliably endlessly. Instead of getting LLMs make runtime choices about enterprise logic, use them to assist create strong, reusable workflows that may be examined, versioned, and maintained like conventional software program.

By maintaining the enterprise logic separate from conversational capabilities, structured automation ensures that programs stay dependable, environment friendly, and safe. This method additionally reinforces the boundary between generative conversational duties (the place the LLM thrives) and operational decision-making (which is finest dealt with by deterministic, software-like processes).

By “predefined, tested workflows,” we imply creating workflows throughout the design part, utilizing AI to help with concepts and patterns. These workflows are then applied as conventional software program, which may be examined, versioned, and maintained. This method is nicely understood in software program engineering and contrasts sharply with constructing brokers that depend on runtime choices—an inherently much less dependable and harder-to-maintain mannequin.

Alex Strick van Linschoten and the workforce at ZenML have just lately compiled a database of 400+ (and rising!) LLM deployments within the enterprise. Not surprisingly, they found that structured automation delivers considerably extra worth throughout the board than the prompt-and-pray method:

There’s a putting disconnect between the promise of absolutely autonomous brokers and their presence in customer-facing deployments. This hole isn’t stunning after we study the complexities concerned. The actuality is that profitable deployments are inclined to favor a extra constrained method, and the explanations are illuminating.…
Take Lindy.ai’s journey: they started with open-ended prompts, dreaming of absolutely autonomous brokers. However, they found that reliability improved dramatically once they shifted to structured workflows. Similarly, Rexera discovered success by implementing resolution timber for high quality management, successfully constraining their brokers’ resolution area to enhance predictability and reliability.

The prompt-and-pray method is tempting as a result of it demos nicely and feels quick. But beneath the floor, it’s a patchwork of brittle improvisation and runaway prices. The antidote isn’t abandoning the promise of AI—it’s designing programs with a transparent separation of issues: conversational fluency dealt with by LLMs, enterprise logic powered by structured workflows.

What Does Structured Automation Look Like in Practice?

Consider a typical buyer help state of affairs: A buyer messages your AI assistant saying, “Hey, you messed up my order!”

The LLM interprets the consumer’s message, asking clarifying questions like “What’s missing from your order?”
Having obtained the related particulars, the structured workflow queries backend information to find out the problem: Were gadgets shipped individually? Are they nonetheless in transit? Were they out of inventory?
Based on this info, the structured workflow determines the suitable choices: a refund, reshipment, or one other decision. If wanted, it requests extra info from the client, leveraging the LLM to deal with the dialog.

Here, the LLM excels at navigating the complexities of human language and dialogue. But the vital enterprise logic—like querying databases, checking inventory, and figuring out resolutions—lives in predefined workflows.

This method ensures:

Reliability: The identical logic applies persistently throughout all customers.
Security: Sensitive operations are tightly managed.
Efficiency: Developers can check, model, and enhance workflows like conventional software program.

Structured automation bridges one of the best of each worlds: conversational fluency powered by LLMs and reliable execution dealt with by workflows.

What About the Long Tail?

A standard objection to structured automation is that it doesn’t scale to deal with the “long tail” of duties—these uncommon, unpredictable situations that appear not possible to predefine. But the reality is that structured automation simplifies edge-case administration by making LLM improvisation secure and measurable.

Here’s the way it works: Low-risk or uncommon duties may be dealt with flexibly by LLMs within the brief time period. Each interplay is logged, patterns are analyzed, and workflows are created for duties that change into frequent or vital. Today’s LLMs are very able to producing the code for a structured workflow given examples of profitable conversations. This iterative method turns the lengthy tail right into a manageable pipeline of recent performance, with the information that by selling these duties into structured workflows we achieve reliability, explainability, and effectivity.

From Runtime to Design Time

Let’s revisit the sooner instance: A buyer messages your AI assistant saying, “Hey, you messed up my order!”

The Prompt-and-Pray Approach

Dynamically interprets messages and generates responses
Makes real-time API calls to execute operations
Relies on improvisation to resolve points

This method results in unpredictable outcomes, safety dangers, and excessive debugging prices.

A Structured Automation Approach

Uses LLMs to interpret consumer enter and collect particulars
Executes vital duties by way of examined, versioned workflows
Relies on structured programs for constant outcomes

The Benefits Are Substantial:

Predictable execution: Workflows behave persistently each time.
Lower prices: Reduced token utilization and processing overhead.
Better safety: Clear boundaries round delicate operations.
Easier upkeep: Standard software program growth practices apply.

The Role of Humans

For edge circumstances, the system escalates to a human with full context, making certain delicate situations are dealt with with care. This human-in-the-loop mannequin combines AI effectivity with human oversight for a dependable and collaborative expertise.

This methodology may be prolonged past expense stories to different domains like buyer help, IT ticketing, and inside HR workflows—anyplace conversational AI must reliably combine with backend programs.

Building for Scale

The way forward for enterprise conversational AI isn’t in giving fashions extra runtime autonomy—it’s in utilizing their capabilities extra intelligently to create dependable, maintainable programs. This means:

Treating AI-powered programs with the identical engineering rigor as conventional software program
Using LLMs as instruments for era and understanding, not as runtime resolution engines
Building programs that may be understood, maintained, and improved by regular engineering groups

The query isn’t how you can automate the whole lot directly however how to take action in a means that scales, works reliably, and delivers constant worth.

Taking Action

For technical leaders and resolution makers, the trail ahead is obvious:

Audit present implementations:

Identify areas the place prompt-and-pray approaches create danger
Measure the associated fee and reliability impression of present programs
Look for alternatives to implement structured automation

2. Start small however suppose large:

Begin with pilot initiatives in well-understood domains
Build reusable parts and patterns
Document successes and classes discovered

3. Invest in the best instruments and practices:

Look for platforms that help structured automation
Build experience in each LLM capabilities and conventional software program engineering
Develop clear tips for when to make use of totally different approaches

The period of immediate and pray may be starting, however you are able to do higher. As enterprises mature of their AI implementations, the main target should shift from spectacular demos to dependable, scalable programs. Structured automation gives the framework for this transition, combining the facility of AI with the reliability of conventional software program engineering.

The way forward for enterprise AI isn’t nearly having the most recent fashions—it’s about utilizing them properly to construct programs that work persistently, scale successfully, and ship actual worth. The time to make this transition is now.

Beyond “Prompt and Pray” – O’Reilly

TL;DR:

Learn quicker. Dig deeper. See farther.

The Evolution of Expectations

The Prompt-and-Pray Problem

Unreliability

Inefficiency

Complexity

Security

A Better Way Forward: Structured Automation

What Does Structured Automation Look Like in Practice?

What About the Long Tail?

From Runtime to Design Time

Building for Scale

Taking Action

LEAVE A REPLY Cancel reply

ABOUT US

POPULAR POSTS

Homemade canine treats (fast and wholesome!)

Expanding the Foundation of AI-Native SOCs: Mastering Holistic Data Integration

12 Key Signs Your Child’s Grandparent Is a Narcissist

POPULAR CATEGORY