Share this
Table of Contents
- Quick Summary
- We Outgrew the Mental Model Without Noticing
- The Reframe That Changes Everything
- A Pretty Critical Moment
- What AI Actually Is & Isn’t
- The Skeptics Aren’t Wrong, But They’re Asking the Wrong Question
- Human in the Loop, On the Loop, At the Loop
- The Bigger Shift Coming
- The Identity Crisis Is the Point
Quick Summary
NANOG 96 sparked conversations about network automation’s identity crisis, validating that the shift from config-pusher to service delivery platform to agentic operations isn’t a problem. It’s evidence of an industry growing into something bigger, and a preview of what comes next when AI reasoning meets network automation.
I’m still processing the conversations from NANOG 96 in San Francisco. Not just the ones that happened on stage, but the ones in the hallways, over coffee, between sessions. That energy is always honest. Practitioners talking to practitioners. No polish, no positioning. Just people who run real networks at real scale trying to figure out what comes next.
I joined a panel with Justin Ryburn from Kentik and Bill Lapcevic from NetBox Labs, moderated by Ethan Banks of Packet Pushers. Ethan set the premise before we even got started:
“Device configuration is incidental to what network automation is really all about.”
That framing, from the moderator and not the vendors, tells you something about where our industry’s head is at right now.
Network automation doesn’t quite know what it is anymore. It started as one thing, became something bigger, and now, with agentic AI entering the picture, it’s becoming something different again.
That identity crisis isn’t a problem. It’s a sign that we’re growing up.
We Outgrew the Mental Model Without Noticing
Ask most network engineers what automation means and the first answer is some version of: pushing configs to a lot of devices faster than you could do it manually. And that was a real problem worth solving. We were writing Perl scripts to update SNMP community strings in the late 90s. That was automation. It mattered.
But here’s what I keep coming back to: we’ve been outgrowing that mental model for years without fully updating the framing. Automation evolved from scripts to integrations, from integrations to orchestration, from orchestration to full service lifecycle management. The job description changed. The label didn’t.
What we’ve always been building toward, even when we called it “network automation,” is the ability to deliver services at scale.
Device configuration is one step in that process. It is not the process. That identity crisis isn’t a problem. It’s a sign that we’re growing up.
The Reframe That Changes Everything
The network doesn’t deliver configs. It delivers services. The moment you internalize that, everything about how you design, measure, and talk about your work changes.
Services traverse multiple domains across infrastructure: data center, WAN, cloud, security. A service has a lifecycle: you create it, you modify it, eventually you delete it. And critically, you’re usually building it for someone else. An application team, an external customer, a line of business that just needs the thing to work without caring about the details underneath. The job of automation is to make the intent easy to express and the execution reliable.
Justin pushed this further in our follow-up conversation: if you ask business stakeholders what a “service” is, they’re not thinking L3 VPN or EVPN fabric. They’re thinking checkout flow, application delivery, EC2 instance. We think in network primitives. They think in outcomes. The gap between those two things is where most automation initiatives stall. Not because the technology isn’t there, but because we’re solving at the wrong layer.
The hyperscalers figured this out. There’s real routing and switching underneath AWS. Real cabling, real devices, real configuration. But the developer spinning up an EC2 instance doesn’t know that, and shouldn’t have to.
That abstraction, network as invisible and reliable substrate, is the destination. We’re building toward it whether we frame it that way or not.
A Pretty Critical Moment
We have new network automation tools at our disposal that we didn’t have before. And a lot of the conversations we’ve been having about automation and orchestration are still constrained by how we’ve been operating for the past five to ten years. That’s worth naming.
The software industry is rethinking how it builds pipelines, how it structures services, how teams interface with each other and with infrastructure.
Infrastructure as code. Programmable interfaces. Loosely coupled systems that can move independently and still compose into something larger. These aren’t new ideas, but their applicability to network operations is newer than we’ve treated it.
Most people in the room at NANOG have already done this to some degree for their most critical services. The technical debt, the complexity, the need to get the orchestration right – those things reserve this kind of work for the highest-value use cases. That’s changing. And what changes it is what’s on the horizon.
What AI Actually Is & Isn’t
Before we get to agentic operations, I think we need to be honest about what we’re actually talking about when we say AI.
Think of an AI agent as a software process. It has a library, just like we use a library to connect to a database. That library connects to an LLM. The LLM handles the reasoning. You give it a prompt, a set of tools it’s allowed to use, and a goal. It works through the steps to achieve that goal, calling out to whatever tools you’ve given it access to along the way.
What foundational models are genuinely good at is reasoning high in the stack. They understand what it means to provision a VLAN. They understand what a software upgrade involves. They understand EVPN. What they are not good at is the version-level specifics of a mixed Juniper and Arista fabric. The implementation details. The edge cases in your particular environment. That’s where people get burned. They expect the model to know things it was never trained on, and then they conclude the whole technology is overhyped.
The right question isn’t whether AI is good enough. It’s what AI is good at, and how we build around that honestly.
Those of us who write pipelines know the problem well: every time a different error condition shows up, a different payload comes back from an API call, you end up with technical debt in the pipeline to handle it. The question I keep asking is whether we can back off some of that rigidity and let reasoning handle three varieties of an error message without having to explicitly code three exact error conditions. That’s not magic. That’s just using the right tool for a specific part of the problem.
The Skeptics Aren’t Wrong, But They’re Asking the Wrong Question
At NANOG, Ethan polled the room on agentic AI. Roughly 50/50 split between people who felt it was ready for production networks and people who didn’t trust it yet. That split is healthy. The skepticism is earned.
The concern I hear most often is about determinism. Network engineers need predictable, reliable outcomes. An LLM by nature produces probabilistic answers. Those two things feel fundamentally incompatible.
Here’s how we think about it: put the determinism between the agent and the network. The agent can only do the things you allow it to do. If you’ve given it the ability to run a Jinja template, open a ServiceNow ticket, and send a Slack message – those are the only actions it can take. It’s not jailbreaking your infrastructure. You get a full audit trail of the agent’s reasoning: every step, every decision. You can watch it the same way you watched your first automation scripts run before you trusted them enough to let them run unattended.
That’s the part I think a lot of engineers haven’t fully worked through yet. An LLM can produce a non-deterministic answer and still be constrained to a deterministic set of actions. The reasoning doesn’t have to be rigidly predictable for the outcomes to be controlled.
Human in the Loop, On the Loop, At the Loop
This mirrors automation adoption exactly. Before we ever let automation run a thousand times overnight, we did testing, validation, humans approving things for a period of time. We built confidence through repetition. We started read-only. We watched every diff. We graduated to autonomy in bounded use cases first.
Agentic AI follows the same arc. One of our joint customers with Kentik has fully automated DDoS detection and mitigation. No human in the loop anymore. But they didn’t start there. They started with a human saying yes every single time, built confidence over hundreds of events, and eventually reached the point where they trusted the system enough to take themselves out of the loop. That’s the maturity model. It isn’t fast, and it isn’t supposed to be.
The goal isn’t autonomy for its own sake. The goal is using a reasoning tool we didn’t have before to do things we couldn’t do before. For the most critical services, full guardrails still apply. Deterministic execution. Human review.
But for the volume of operational tasks that never reached the ROI threshold for full automation, reasoning changes the calculus. We can handle more, do it faster, and scale things that used to require a human to babysit every step.
The Bigger Shift Coming
Bill said something in our follow-up conversation that I keep coming back to. The DevOps movement bridged app developers and operations teams. They found common ground across what had been an intentional divide, and the application world accelerated. We’re at the start of that same thing happening with networks and the rest of the IT stack.
We designed the network and the application layer to be decoupled from each other, and for good reasons. But that decoupling also meant that when something went wrong, everyone pointed at each other. “It’s the network.” “No, it’s the app.” Nobody could see past the wall.
AI changes that. Not because it suddenly understands both layers perfectly, but because it can correlate across data streams fast enough to find patterns that humans would take hours to uncover. Syslog over here, config change over there, traffic spike at the same timestamp. That’s an agent’s strength. And when you can ask “why is my application in US East slow” in natural language and get an answer that draws from network telemetry, flow data, and recent config changes, without logging into three different tools, the silos start to break down in a way that actually matters.
That’s the transition we’re in the middle of. Not from manual to automated. From siloed to integrated. From rigid to adaptable. From activity-focused to outcome-oriented.
The Identity Crisis Is the Point
NANOG reminded me that the discomfort our industry is feeling right now isn’t a problem to solve. It’s evidence that we’re in the middle of a real transition.
We spent two decades getting very good at automating tasks. Now we have to get good at something harder: thinking in systems, designing for outcomes, and operating infrastructure at a scale and speed that wasn’t possible before.
AI is a reasoning tool. One we didn’t have before. The goal hasn’t changed, we just have a new capability to pursue it with.
The engineers who figure out how to use it honestly, where it’s strong and where it isn’t, who bring it in early rather than treating it as the finish line, are the ones who are going to define what this industry looks like on the other side.
The identity crisis is real. So is the opportunity.
Watch the Panel from NANOG 96
Take a deeper dive on the conversations and insights from NANOG 96 here or watch the panel on-demand below.