Share this
Table of Contents
On a recent episode of F5’s Pop Goes the Stack, host Lori MacVittie asked me the question I keep getting asked.
“AI is probabilistic. Production is deterministic. How do you bridge that?”
It’s a great question. And it’s the right question. Because if you’re just bolting an LLM on top of your infrastructure and hoping for the best, you’re not doing agentic operations. You’re doing a proof of concept with production credentials. Those are very different things.
Let me tell you what actually happens when you get this wrong.
The 2% Problem
A customer said something to me recently that I haven’t stopped thinking about.
They said: “If we run a million activities a day on the network agentically, even a 2% failure rate is a lot of failures.”
They’re right. Do the math. At that scale, 2% is 20,000 failures. Per day. That’s not a rounding error. That’s an incident queue that never empties.
And here’s the thing – that failure rate isn’t a model problem. It’s an architecture problem.
The models are good. They’re getting better fast.
But no matter how capable the reasoning layer gets, if you’re asking it to freeform its way through production infrastructure without guardrails, deterministic execution, and a governed path – you’re going to accumulate failures. Every. Single. Day.
What “Guardrails” Actually Means
I think people hear “guardrails” and picture a content filter. A list of words the AI can’t say. That’s not what I’m talking about.
I’m talking about skills.
A skill is a structured definition of what an agent knows how to do. It’s written in natural language – markdown, not JSON schemas – and it describes capability, scope, and constraints in the same way you’d brief a human on what they’re allowed to touch.
Here’s a concrete example. When I attached an open-source agent to the VibeOps Forum – nearly 600 engineers from the community – they immediately red-teamed it. Because that’s what network engineers do.
One person asked it to change a boot variable and reload a router. Smart. That router would’ve gone offline until someone physically went in to fix it.
But the agent didn’t do it. Because the skill definition was explicit: do not change the management interface, do not alter the default route, do not add ACLs that would deny access, do not change any passwords.
Not in code. Not in a YAML file. In plain language, describing exactly what a responsible operator would never do.
That’s a guardrail. That’s the architecture working.
The Part Everyone Skips: Onboarding Your Agent
Here’s the approach I advocate for, and it maps directly to how you’d bring on any new team member.
Don’t give an agent root access on day one.
I know that sounds obvious. But I watch organizations spin up agents and immediately point them at production because the demo worked. The demo is not production. The demo has no blast radius.
The sequence matters:
Start Read-Only
Have the agent fully document your network, run compliance checks, audit configs, validate state. This is not “starting small.” A well-instrumented read-only agent surfacing drift and anomalies across your fleet is genuinely valuable – and it’s how you build trust.
Human in the Loop
Before any write action, the agent checks in. It opens a ServiceNow ticket. It sends a Slack notification. It says, “Here’s what I’m about to do. Approve or deny.” You get a record. You get a checkpoint. You stay in control.
Human on the Loop
Now the agent has autonomy within defined bounds. It acts, it notifies, you’re watching. You trust it because it’s earned that trust through a track record you’ve actually observed.
Supervised Autonomy
The endgame – where the agent is operating as a genuine digital coworker, handling the high-volume, repeatable, context-aware work that was always eating your best engineers alive.
This arc isn’t cautious. It’s correct. And it’s exactly how you get to production without accumulating a backlog of failures you can’t explain.
Why the Execution Layer Is Not Optional
This is the architectural point I want to land.
AI reasoning is brilliant at interpretation, intent translation, and dynamic decision-making. It is not what you want executing a 1,400-line YAML pipeline in a change window at 2am.
Deterministic workflows exist for a reason. They’re repeatable. They’re auditable. They’re testable.
They are the reason production infrastructure doesn’t fall over every time someone makes a change.
The right model is not replace deterministic execution with AI reasoning. It connects AI reasoning to deterministic execution.
The agent reasons about what to do. The workflow does the doing. The governance layer ensures the whole thing is traceable.
That’s not a constraint on what’s possible. That’s what makes it possible in production at scale.
When I built a network interface health agent in two minutes using FlowAI – not a proof of concept, a production-ready agent – it worked because the MCP tools it called were deterministic. The agent’s intelligence was in knowing when and why to call them. The reliability was in what happened after the call.
Those are two different jobs. They need two different systems working together.
The Skills Divide Is Already Structural
I’ll say something that might be uncomfortable.
The engineers who spent the last decade building automation foundations – who learned Ansible, who wrote Python, who built REST integrations, who treated their network as code – those engineers are now pulling dramatically ahead.
Not because their old skills transfer one-to-one. They don’t.
But because they understand what deterministic execution is, why it matters, and how to wire an AI reasoning layer on top of it without losing the properties that make it trustworthy.
The engineers who skipped that step are finding that natural language interfaces don’t fix a lack of operational foundation. The agent is only as reliable as what it’s connected to.
MCP is a year old. RAG is two years old. The enterprise’s patience for “we’re still evaluating” is running out fast.
If you’re reading this and thinking I need to get started – you’re right. And you’re not as far behind as you think. The tools are accessible, the frameworks are open, and the path from local experimentation to production deployment has never been shorter.
But the path still requires a foundation. Start building it today.
What Agent Would You Build?
If you’re in infrastructure and you’re not sure where to begin – start with read-only. Pick one operational problem that consumes hours of human time. Something repeatable, high-frequency, and well-defined. Document your network. Run a compliance check. Validate interface health across a set of devices.
Build the skill. Define the guardrails. Give the agent a persona and a clear scope. Watch it work.
That first agent isn’t just a demo. It’s a proof point – for your team, your security stakeholders, and your own confidence that this is real and it’s ready.
The agentic era of infrastructure operations isn’t coming. It’s here. The organizations that operationalize it safely, with governed execution and deterministic rails, are the ones that are going to define what this industry looks like in three years.
Be one of them.
