Blogs

The New Reality of AI in Infrastructure: From Skepticism to Governed Agentic Operations

William Collins

Director, Technical Evangelism

AI in infrastructure has crossed a threshold, but what makes it production-ready isn’t the model. It’s the governance layer around it. When AI runs inside a platform with RBAC, audit trails, and deterministic execution, the conversation shifts from fear to practicality. That’s the foundation of agentic operations.

There are moments in this industry when you feel a shift before you can name it. You hear the same questions surface across customers, communities, and hallways. And every now and then, you walk into a room where the energy tells you something bigger is happening.

That was AutoCon 4.

Every conversation had a common thread. Engineers comparing notes about AI experiments in their labs. Operators debating how far they’d trust an agent to touch production. Leaders asking how to create guardrails before giving models any real freedom. The whole community was wrestling with the same tension: the potential of AI is real, but nobody wants to hand over control without a safety net.

That energy was exactly what Eyvonne Sharp and I wanted to dig into on the Cloud Gambit podcast. We recorded an episode right in the middle of it all, surrounded by practitioners who were living this question in real time. And what came out of that conversation crystallized something I’ve been watching play out for a while now: the industry’s relationship with AI in infrastructure isn’t changing because the technology got perfect. It’s changing because we finally understand how to pair intelligence with governance.

Once you can place AI inside a disciplined framework – with strict boundaries, platform-level controls, and complete transparency – the conversation shifts from fear to practicality. From no way to we can sleep at night.

That shift is the foundation of FlowAI and the broader move toward agentic operations. Here’s my attempt to lay out why AI matters now, what actually changed, and the operating model infrastructure teams are moving toward.

Why AI Became Impossible to Ignore

Anyone who manages infrastructure knows the pressure. Hybrid networks are larger, more dynamic, and more interdependent than anything we saw a decade ago. Cloud adoption, edge proliferation, AI stacks, remote work, and constant change have created new levels of operational friction.

Meanwhile, automation exists in pockets. A Python script here. A workflow there. Tooling that solves a specific problem but doesn’t scale across domains or vendors. Even when automation succeeds, it stays brittle. A workflow works until the day it doesn’t – often because a vendor changes a schema, a new device shows up, or a topology evolves.

This is where AI forces its way into the conversation. Not as a replacement for automation, but as a complement to it. AI can interpret context, reason about intent, and make decisions that bridge the gaps between rigid logic and dynamic infrastructure. It can read device outputs, understand protocol state, and help teams think in terms of outcomes instead of low-level tasks.

But potential is not the same as production readiness. Infrastructure teams operate with one prevailing truth: you do not break the network. Without guardrails, AI introduces too much uncertainty and too much unpredictability.

The real question became: how do you take advantage of what AI offers while keeping everything deterministic, safe, and fully governed?

Why Governance Matters More Than the Model

AI is not the product. AI is the tool. The platform that governs how AI interacts with infrastructure is what matters.

This is where the industry’s view of AI often goes sideways. Organizations focus on choosing a model – public or private, open source or commercial, large or small. What they should be asking first is: how will that model be controlled? How will its output be validated? How will its actions be audited? How will its blast radius be contained?

Most engineers don’t trust automation deeply enough to hand control to an untethered model. And they’re right not to. Without strong governance, AI is unpredictable. Brilliant one moment, dangerous the next.

The Itential Platform has spent more than a decade solving this problem for deterministic workflow automation: RBAC, secrets management, policy enforcement, traceability, audit history, a consistent framework for the execution of change. FlowAI is designed to inherit all of that.

By sitting inside the platform, AI runs within the same operational guardrails that already protect the organization.

This is what makes AI usable in infrastructure. Not intelligence. Not creativity. Predictability.

When every action runs through a governed execution layer, AI becomes safe to empower. You know what it can touch. You know what it can’t. You know how far it can go. And you know who approved it, when it executed, and what it changed.

That’s the difference between AI as a science experiment and AI as a production capability.

How MCP Changes AI Governance in Infrastructure

The breakthrough came with a clearer understanding of what the Model Context Protocol (MCP) could actually do.

Most of the industry uses MCP wrong. They’re taking existing APIs and simply exposing them through the protocol – which recreates the same problems APIs already had, just shifted to a new interface. MCP isn’t supposed to be a wrapper for APIs. It’s supposed to give the model structured capabilities that map to real operational actions.

Watch: Is MCP Just an API Abstraction? | Autocon 4 →

With MCP, you can attach curated tools to an agent. Not the entire API surface. Not every command. Only the specific, governed actions that are safe for it to perform.

This gives teams something that didn’t exist before: control over the blast radius.

Even if a model hallucinates, the worst it can do is call an approved tool with an approved shape. The deterministic automation behind that tool guarantees a safe action. Combine that with platform-level RBAC and audit trails, and you have a model that can reason freely while acting inside strict boundaries.

MCP links the reasoning world of AI to the deterministic execution world of automation. That’s the foundation of agentic operations.

From Brittle Workflows to Adaptive Agentic Patterns

The classic automation pattern most teams use today is the parent workflow. It works for a while, until the network changes. A new vendor is introduced, an acquired business adds unfamiliar devices, and the parent workflow has to be rewritten. And rewritten. And rewritten again.

That’s toil. And it’s one of the biggest barriers to automation at scale.

AI changes the pattern. Instead of baking every branch, exception, and vendor variation into a single fragile workflow, you separate the logic. You build small, vendor-specific workflows that do one thing well. The agent reasons through context, identifies the right workflow, and calls it when needed.

It’s the old Unix philosophy applied to network operations: do one thing, do it well, and let something smarter decide when it should be used.

The result is a system that evolves with the network naturally.

The agent adapts. The workflows stay modular. Engineers spend less time patching automation and more time designing networks.

AI Beyond Device Updates

Most people think of AI in terms of config generation or device updates, but those are surface-level applications. The deeper value comes from the model’s ability to interpret operational state.

Feeding an OSPF database into a model and asking it to identify hotspots, misconfigurations, or likely root causes. Asking it to plot traffic flows that correlate with performance drops. Letting it reason through topology changes and propose safer paths.

AI can sift through telemetry and routing data in ways that humans simply don’t have time for. It can detect patterns that are invisible at scale. It can help generate design insights that improve resiliency before an outage ever occurs.

These aren’t future ideas. They’re real capabilities that become possible when the right governance and execution model are in place.

Why Model Flexibility Matters

The model landscape changes monthly. One week it’s Gemini. Next week it’s Claude. Then an open-source model arrives that outperforms both for a specific task.

Organizations can’t lock themselves into a single model. The Itential Platform is model-agnostic by design. Customers can connect as many models as they want – some public, some private, some for reasoning, some for classification, some for translation.

The platform handles access control, assigns which agents can use which models, and maintains a consistent execution path so operations don’t have to be rewritten every time a new model shows up.

This is one of the strongest reasons FlowAI sits inside the platform rather than beside it.

Scaling AI Without Compromising Stability

Innovation in AI is moving at a speed the industry hasn’t seen before. But infrastructure can’t move at that speed. It must be stable, predictable, conservatively evolved. The challenge is letting the AI side iterate quickly without destabilizing the systems that execute real change.

The separation between reasoning and execution is what makes this possible. Agents can evolve fast – they can reason, interpret, adapt, improve. But the execution layer remains governed, stable, and built on proven automation.

That balance is essential for enterprise-scale adoption of AI: innovation without sacrificing trust.

What Is the Agentic Operating Model for Infrastructure?

We’re entering a new operating model for AI infrastructure. One where:

AI interprets context
AI reasons about intent
AI selects the right workflow
The platform enforces boundaries
Deterministic workflows execute the change
Humans stay in control throughout

This isn’t a far-off future.

Teams won’t adopt AI because it’s clever. They’ll adopt it because it reduces toil, increases safety, speeds up operations, and gives engineers time to focus on strategic work instead of maintenance.

The organizations that get this right are the ones that treat AI as a reasoning layer inside a broader automation fabric. That work with open protocols like MCP rather than locked-down vendor ecosystems. That view their automation as a platform, not a project.

Where We Go From Here

The real work of the next few years isn’t model improvement. It’s operational improvement. The networks we manage are too large, too dynamic, and too essential to run without intelligent support. AI gives us that support. Governance makes it safe. The platform makes it real.

One of the most practical advances I’ve seen in making that real is spec-driven development (SDD): the practice of starting with a structured, machine-readable specification of intent before any automation is built. Instead of writing workflows from scratch and hoping they cover the edge cases, you define what the outcome should look like, validate that spec, and let the platform generate the execution path from it. It’s a fundamentally different starting point: design first, automate second. And it’s what makes AI-generated automation trustworthy enough to run in production, because the model is working from a spec you approved, not improvising from a prompt.

AI won’t replace network engineers. It will amplify them to remove toil, reduce errors, and give teams the breathing room they’ve needed. Spec-driven development is part of how we get there: fewer brittle workflows built from intuition, more automation built from validated intent.

That’s what we’re building at Itential with FlowAI: a future where intelligence and automation work together inside a governed framework that respects the realities of production networks.

I’ll be at AutoCon 5 later this year, and honestly, I can’t wait to see where this community has taken things. The conversations at AutoCon 4 made it clear that the shift is already underway: practitioners aren’t just experimenting anymore, they’re building for production. I’m looking forward to those hallway conversations all over again.

Take a Deeper Dive

Listen to the full episode of The Cloud Gambit podcast at AutoCon 4, or watch on-demand below. And to go deeper on FlowAI and how it brings together AI reasoning, orchestration, and governed execution, start here.

William Collins

William Collins is a strategic thinker and a catalyst for innovation, adept at navigating the complexities of both startups and large enterprises. With a career centered on scalable infrastructure design, he serves as Itential’s Director of Technical Evangelism. Here, he leads the charge in network automation, leveraging his deep roots in cloud architecture and network engineering. William hosts The Cloud Gambit Podcast, diving into cloud computing strategy, markets, and emerging trends with industry experts. Outside of transforming networks, you can find him enjoying time with family, playing ice hockey, and strumming guitar.

Keep Learning