Skip to main content
Enterprise AI Agents Need Middleware, Not Just Models
blog ship-ai mihai-criveti-ibm

Enterprise AI Agents Need Middleware, Not Just Models

MG
Manav Gupta
4 min read
Table of Contents
Note: This article was generated from the transcript of the original podcast episode. It has been edited for clarity and structure.

The uncomfortable truth about enterprise AI adoption is that large language models alone give you very limited productivity for real business use cases. After four years of working with customers and applying AI internally, I’ve realized that LLMs have been trained on public data from last year. Your enterprise data—unless something went seriously wrong—is not public. And you probably want to make business decisions based on data that’s current and relevant.

This disconnect explains why virtually every successful AI implementation I’ve seen relies on agentic AI: systems that have the right tools to access enterprise data, internal systems, and up-to-date information. But getting there requires solving problems that no off-the-shelf solution addresses. That’s why I built Context Forge.

The Problem With Current AI Protocols

When Anthropic released the Model Context Protocol (MCP) in late 2024, I was immediately interested. But as I implemented it, I quickly understood that MCP as a protocol was insufficient, incomplete, and the implementation itself lacked enterprise features like security, authentication, authorization, observability, monitoring, reliability, and high availability.

The protocol couldn’t handle routing when tools became unavailable. It had no self-healing or remediation capabilities. It couldn’t identify when a tool call failed and try something else. And it had a hard ceiling on tools—128 maximum, with 10 being realistic for actual performance.

“Once you’ve launched something in the world, you have to maintain it, evolve it. It’s hard to deprecate specifications. It’s hard to change the implementation.”

Here’s the real issue: even if you have access to the strongest AI model in the world, you still need to know what to ask. Lacking enterprise experience, lacking direct customer experience, lacking integration needs—you’re not going to know what to build. Models have severe limitations. They’ve been trained on public data, don’t have access to enterprise requirements, and treat all data the same way. But if you’re a bank or financial institution, your requirements differ from the rest of the world.

Diagram

What Agentic Middleware Actually Does

Context Forge is a piece of agentic middleware that sits between your AI applications—whether those are just an application talking to an LLM, an agentic framework talking to its tools, or an agent talking to another agent.

Think of it as a point of enforcement. It’s not about defining policy; it’s about enforcing policy. We have plugins for Cedar and OPA where you can write rules like: prevent Manav from accessing the GitHub tool on Tuesday afternoon without prior approval, and only when the output contains specific criteria.

“If network traffic doesn’t go through your firewall, you don’t have the ability to enforce the rules. It’s that centralized approach, as opposed to having a distributed model for agents and tools within your organization.”

The gateway transforms communication. It can change authentication, change authorization, change protocol specifications, convert A2A to MCP, MCP to A2A, REST to MCP. It handles routing—if one server goes down, it connects to another. Everything goes through a plugins framework with pre-hooks and post-hooks.

For example, we use guardrails like PII filters. If you submit your social security number, before it hits the log file, before it hits the LLM, before it hits your tool, that gets stripped out or blocked or audited. On the way back, if your MCP tool retrieves a million tokens because you requested a file, we can compress or truncate that output so your agent doesn’t break.

Why Visual Agent Tools Don’t Scale

I’ve compared visual agent programming—tools like N8N and Langflow—to the old UML promise. They’re attractive, but they don’t scale to production.

You need two speeds. One speed for rapid prototyping where visual tools help the general population quickly build an agent and understand concepts. But once you need to build at scale, it all goes back to code.

Here’s the good news: previously we relied on visual aids because they made the process easier for humans. Turns out they’re more difficult for LLMs. Whatever visual representation exists is probably some horrifying JSON or XML behind the scenes. LLMs natively deal better with language and code. So if you’re using agents to write your agents, the programmatic approach scales more.

Enterprises need solutions for both rapid prototyping and coded agents, preferably with a reproducible runtime where you give it a configuration and it applies that configuration. Behind the scenes, whether using visual no-code or coded approach, everything goes back to the same agent library, same tools library, same protocols, same observability.

The Uncomfortable Truth About AI Speed

AI doesn’t necessarily make things faster. There are still the same bottlenecks that existed before. It makes work more efficient—if you previously needed 10 people, now you need four. But you still need experts.

Even with AI software development, I’ve found it’s sometimes just as slow as traditional software development. Behind the scenes: do you need to compile code? Add half an hour. Do you need to run Sonar, linters, static analysis? Add that time. Build a container? Deploy the container? Run Playwright for UI testing?

“If previously you had a GitHub repo with 100 developers, how many people are doing the code merges? Probably very few. You still have those bottlenecks. AI doesn’t eliminate them.”

Diagram

When I built Context Forge, I applied the most stringent process for software development. There are more than 1,000 make commands for static analysis, linting, deployment, and testing automation. A GitHub Actions pipeline with more than 60 checkers. All those lessons from building and deploying enterprise software at scale go into making AI-generated code actually work.

What Enterprise CTOs Should Actually Do

If an enterprise wants to be successful in adopting AI, they need a clear strategy for agents, tools, and getting their data plus the model to deliver business value—and preferably a way to evaluate that everything has been done correctly.

The challenge I’ve seen: CISOs and CIO offices say “you cannot connect Gemini or Opus or whatever to SAP.” And honestly, I respect that. It makes sense. I wouldn’t give direct access to these data sources without governance, observability, and management layers in between.

For specific use cases, it’s easier to enforce the right governance and guardrails. But the moment you say “AI has access to everything”—how are you managing context? How good is the output? Does it do anything it shouldn’t? Can you explain the output?

The problem is you need experts. That’s the challenge for new folks joining the field. You still need the seniors with experience in troubleshooting, debugging, security. People need to become more end-to-end in their skillset instead of being overly specialized.

I don’t know what this is going to do to the workforce in the next five years.


For more on building production-grade AI systems, check out our episodes on AI infrastructure scaling and enterprise agent architectures.

Share this article

Related Episodes

Dive deeper into these topics in the podcast.

The Man Behind IBM's AI Agent Gateway
Guest Conversation

The Man Behind IBM's AI Agent Gateway

Mar 4, 2026 54 min

Mihai Criveti, Distinguished Engineer at IBM and creator of Context Forge, on why AI agents need agentic middleware, MCP's enterprise gaps, and what production-grade agent architecture actually loo...

Enjoying this article?

Ship AI is a video podcast covering the trends, tools, and strategies driving enterprise AI. New episodes every two weeks.