Network Automation: Not Just About Configuring Devices

From Device Configuration to Agentic Operations:
The Next Step in Network Automation

At NANOG96, Itential joined Packet Pushers, Kentik, and NetBox Labs to explore how network automation is evolving – from configuring individual devices to orchestrating autonomous, AI-driven operations at scale.

Watch the Full NANOG96 Discussion

Network engineers have spent decades configuring devices. But the real goal was never the config – it was the service, the outcome, the business value on the other end. This Packet Pushers discussion and NANOG panel, both moderated by host Ethan Banks, pull together three perspectives from across the ecosystem to explore what network automation actually is, why most teams are still stuck at the device level, and how agentic AI is creating a genuine opportunity to rethink how we operate infrastructure at scale.

Episode Notes

(So you can skip ahead, if you want.)

00:00 Introduction and Panel Overview
01:15 Network Automation Evolution
04:39 Systems Thinking vs Devices
08:04 Defining Network Services
13:04 Getting Started with Automation
22:39 Agentic AI for Networks
27:34 Business Case and ROI
33:13 Cross-Team Integration and Collaboration
39:08 Telemetry and Service Health
41:43 Wrap Up & Success Metrics
View Transcript

Ethan Banks • 00:05

Welcome to our panel discussion. Our premise today is that network automation isn’t just about configuring network devices. And in fact, it could be argued that device configuration is incidental to what network automation is really all about. That might be my opinion, and I don’t want to speak for our fine panelists. I am Ethan Banks, your moderator. You likely know me as a recovering CCIE and host of the Heavy Networking and Management Networking podcast on the Packet Pushers Podcast Network. Justin Ryburn, Justin, raise your hand.

Ethan Banks • 00:35

Justin is the field CTO at network intelligence company Kentic. Chris Wade, Chris, raise your hand. Chief Technology Officer and co-founder of Itential, where he leads the company’s technology strategy and product innovation. And you might have been inspecting Chris Beavers if you were looking at your notes, but he had life intrude at the very last minute, was not able to make it to Nanog. So Bill Latrovic has ably stepped in for Chris. Bill is the co-founder and chief revenue officer at Netbox Labs. Okay, guys, our panel is entitled Network Automation, Not Just About Configuring Network Devices.

Ethan Banks • 01:09

Okay, what is network automation about? Chris Wade, kick us off.

Chris Wade • 01:15

All right, so I mean, we’ve been talking about configuring devices for a long time, watching Nanog presentations on NetConf, GNMI, all these types of technologies. You know, in the hallway track, we’ve really talked about how we integrate systems together. And automation is, you know, I spent a lot of time talking about automation and orchestration and how we automate tasks and tie them together with orchestration. So I think automation started as pushing configs, then it increased to integrating systems together. And now we’re talking about how we tie systems together for larger outcomes. Justin?

Justin Ryburn • 01:55

Yeah, I mean, I got to echo a lot of what Chris said. I mean, we’ve been writing Perl scripts to do minor configuration changes back from when I ran it.

Ethan Banks • 02:03

You’ve dated yourself.

Justin Ryburn • 02:05

I did date myself. For anyone who’s not familiar with that, that was a programming language that existed before Go, apparently. But yeah, I mean, to Chris’s point, I mean, we’ve been doing small minor changes. Maybe it was changing an SNMP community into configuration on a device or updating an access list on a router and wanting to automate that task, for lack of a better term, across all the devices on the network, right? Instead of having to log into each one and do it manually. I mean, we were doing this back in the late 90s when I ran a network.

Ethan Banks • 02:32

Change the NTP server on all of them.

Justin Ryburn • 02:35

Yeah, on all of them. Yeah. So, you know, I think to Chris’s point, it’s kind of scaled out since then, right? If you think about a network that’s operating at the scale the previous presenter was just showing, where you’ve got, you know, thousands and thousands and thousands of devices in these fabrics, you can’t log into those and manually configure them, right? You’ve got to automate that. And you’ve got to have a good. System in place to Chris’s point to be able to do that automation, right?

Justin Ryburn • 03:00

It’s not just about point scripts, it’s about really thinking holistically about the system and what you’re trying to accomplish.

Bill Lapcevic • 03:06

Bill, you want to add? Sure. I really like the idea of thinking holistically about the network as a system. I think when you are looking at just configuring devices, that’s good for getting things done, maybe moving a little bit more quickly. But I think the network is becoming a competitive advantage for most enterprises these days. And speed is of the essence, and understanding how each device actually fits in the grander context, what services, what business drivers are being affected every time a device is either configured correctly or misconfigured. So automation becomes the way that you gain that control over all these disparate devices in your network and start to provide a competitive advantage to your enterprise.

Ethan Banks • 03:53

Okay, you guys said holistic, you said system, but yet network engineering for a lot of years has been device by device. We go into a device, we configure it to do the thing. The systems thinking that might happen happens in our heads as a network engineer. That is, we know we have to configure these six or ten or whatever devices to accomplish some task or to deliver some service. And we understand that as engineers, how to deliver that, but we still do it device by device. So how do we change our thinking so that we’re using automation platforms to deliver a service? Yeah, there’s still individual device configurations that are going to happen, but we’re doing that holistic delivery of those configurations.

Ethan Banks • 04:39

Something’s got to change, right?

Justin Ryburn • 04:42

I mean, maybe I challenge your assumption slightly, Ethan. I mean, there are systems, like, you know, if you think of like a data center fabric orchestration that some of the vendors offer, right, where they think of like an entire data center or at least an entire rack as a system. And they view that as a system that works well together and is orchestrated by a controller and so forth. So we have seen examples, I think, as an industry of that, but I get your point that as a, you know, in a broader sense, we still tend to think of individual devices coming together and built together by the network engineer as a system. But yeah, I think it’s a mind shift. I think we need to view the entire end-to-end network as part of the system that’s delivering the experience, whether that’s applications, users, content, whatever it may be, right?

Ethan Banks • 05:30

What does that mean for our automation systems, though? Is there something like we were talking in the beginning about automation initially was a script and we just run it 100 times across 100 different devices to accomplish the same task repeatedly. We’re not talking about that at this point. How would you describe it?

Justin Ryburn • 05:50

I mean, you know, I’ve always been a big fan of like distributed control planes and building an entire, let’s say, a whole entire nationwide network where you have a single control plane. I think your failure domain is way too large for that. So I still think we want our devices, at least has always been my experience, you want your devices to each think for themselves, right? To have their own individual control planes. You want them orchestrated through routing protocols and stuff, but having them all operate as one, I don’t know, I think is a bad idea. That doesn’t mean you can’t do configuration automation as a holistic thing and view your entire network as a system and orchestrate things like that.

Chris Wade • 06:27

Chris? Yeah, it might take us in a different direction in the sense that, you know, I think most people in the room have spent significant time automating and orchestrating services. A lot of data models involved. And we’ve really been adopting infrastructure as code over the last couple years to tie a lot of these systems together. And I think there’s been a lot of AI presentations this week. Maybe throw that in here in the sense that as a software company, we’re rethinking how we build software. We’re rethinking how we do pipelines.

Chris Wade • 07:03

And there’s been a couple comments about learning from external verticals and such. So I think we have an opportunity to kind of rethink how we build services and how we build these pipelines with a lot of the concepts that are going on in software development today. So I think we really do have an opportunity to rethink how services are offered, what products we offer. And we have new tools at our disposal that we didn’t have before. So a lot of the discussions around automation orchestration are fairly constrained by how we’ve been operating for the past five or ten years. But I do think we have a pretty critical moment in our industry to rethink kind of how we operate infrastructure larger.

Ethan Banks • 07:49

I think we need to define service then, because service could mean L3 VPN or it could mean something quite a bit smaller, depending on your point of view. Bill, maybe you got a take on this. You can model services in Netbox, e.g. .

Bill Lapcevic • 08:04

Sure. I mean, you can start with devices. Devices make up services. It’s kind of you’re going up a level of abstraction, and then maybe even another level of abstraction. You know, a router can be part of a checkout system, e.g. . Or it could be part of multiple systems. It could be part of the checkout system.

Bill Lapcevic • 08:23

It could be part of your web access, e.g. . The ability to look at things not by a device-by-device method, but understand what business impact these are happening, or these are having, is very, very important, especially as you’re looking to sell network automation within your own organization to show the impact that your network engineering actually has on the business. You need to start thinking in terms of the things that the executives are thinking about, like speed, like risk reduction, like rolling out new business processes faster to gain a competitive advantage. And only by looking at groups of devices as services are you really able to get to that level of understanding that an executive will resonate with.

Ethan Banks • 09:14

Justin, you got some ideas about defining a service to give us some examples of what you think of as a network service.

Justin Ryburn • 09:21

Well, I think if you were to ask the business stakeholders what they define as a service, I’m not sure network service is what they actually think of. I think maybe this is the point that Bill was making, right? We think in terms of network service, like are we delivering an L3 VPN? Are we delivering an EVPN, a data center fabric overlay? Like, those are the type of services we think in because we’re network nerds, right? But I think if you were to go and ask the business stakeholders when they think of service, they think again of like the applications that we’re delivering to our users, the cart, the checkout, like that’s the way they think. And I think, you know, kind of taking this back to network automation for a 2nd , if we’re building automation frameworks that allow the other parts of the business to sort of self-serve and build their services on top of the network without needing to understand the network and how it works and how we automate it, how we configure it, how it all comes together, that’s really what the business is after, right?

Justin Ryburn • 10:14

That’s what they’re trying to accomplish. I mean, if we look at the way that the hyperscalers have delivered quote-unquote services to the applications teams that are delivering it, they’ve abstracted away the network. Now, we all know there is a network under there. There’s real router switches, firewalls, cabling that exists to make all that stuff work. But to the developers who are consuming an EC2 instance and connecting it to an S3 bucket and delivering an application, they don’t need to know that any of that exists, right? Because it’s all abstracted and automated, for lack of a better term, for them.

Ethan Banks • 10:47

So if we okay, so if the network engineer is delivering a network service, and are we saying that we are not configuring network devices? I mean, we are, because it has to be done. But are we supposed to be thinking at the abstracted level or what’s happening on the network devices or both?

Justin Ryburn • 11:04

I would say both.

Ethan Banks • 11:05

Okay.

Chris Wade • 11:07

Yeah, I mean, services are stacked on services, right? So you brought up the AWS example. So if you look at Agent Core running on S3 and Lambda underneath, you know, it depends on where you are in the stack. But the one other thing I would add in the services discussion is historically we’ve thought about services as stuff we deliver on our assets. You know, you brought up eVPN and Layer 3 VPN style services. But increasingly we’re, I mean, we just saw the presentation. We’re talking about global provisioning of services.

Chris Wade • 11:40

A lot of times it’s on other people’s stuff too, right? So we might be delivering a service on our own infrastructure, but increasingly we’re delivering services on other people’s stuff, especially when you start bringing up the cloud examples.

Ethan Banks • 11:56

Raise of hands. You guys that are working with network automation, are you thinking about it the way we talked earlier in the panel? I’m using a network automation to deliver the same small thing to 100 devices or 1,000 devices. That’s your primary use case for automation. Got a few people like that. What about what we’re talking about now, where it is more about services? You’re thinking in terms of services.

Ethan Banks • 12:26

Raise your hand. How many of you don’t use automation? Okay, some of that too. All right, so if my job is to deliver a network service and it’s not configuring network devices, I might be configuring network devices along the way, that’s part of it, right? But my primary job, the way I should be thinking about this, is I’m delivering a network service. How do I get started doing that? Because I would argue a lot of network engineering shops are not set up for that way of thinking, especially not for like self-service and service catalogs, which I think, Chris, you mentioned that earlier on.

Ethan Banks • 13:04

So how do I get started with building a network service as a way of thinking and then building a practice around that in my network operations?

Justin Ryburn • 13:14

I’m sure, I know you have lots of opinions on this, so I want to hear you 1st .

Chris Wade • 13:18

Yeah, I mean, there’s some chicken and eggs involved. I think that the 1st thing we need to do is look at the programmability of the infrastructure. And where it doesn’t exist, we use tooling and instrumentation to do so. I think that’s a critical component of delivering a service if we’re going to talk about automation and or.

Ethan Banks • 13:39

Programmability, as in my infrastructure can live as code, and then I can use programmability to deliver that infrastructure as code to a device?

Chris Wade • 13:47

Yeah, maybe not as code, but it needs to have the ability for a machine to interface it in some meaningful way. So, you know, step number one, you know, we’ll talk about modeling and other things, I’m sure, from Bill. But I think at the end of the day, we need to understand the product that we’re offering our customer. It could be internal or external. We need to understand the components of that. We need to have a data structure to support it, and we need to be able to program it at scale. And the other part I’ll add that we really haven’t talked about is the entire ecosystem of the customer, the product, the service, and the inventory that might be associated with that, and keeping that all in sync.

Chris Wade • 14:28

Because most of these services lead to concepts of life cycle. We start off by talking about pushing config. If it’s fire and forget, then we end up understanding whether the network is in the state we want it in. So immediately when we start talking about services and tying systems together, I think you lend yourself to starting to think about the life cycle of these services. Because ultimately, there’s a customer on the other end of that. So we can understand how we’re ultimately delivering value to whoever that is, internal or external.

Ethan Banks • 14:59

You brought up models. And Bill, let’s talk about that. Well, can you get into an example of maybe how we model a service, what that looks like?

Bill Lapcevic • 15:08

Yeah, so I actually wanted to just pick up on one thing that Chris was talking about, which the planning. Planning can’t start with how am I going to configure that device. Planning actually has to start at the other side, right? What is the service I’m going to bring? What is the underlying infrastructure that needs to support that service? And therefore, what devices need to be there and how do they need to be configured? You said start on the other side, as in talking to the consumers, the business side?

Bill Lapcevic • 15:38

Or from the top down, right? Start at imagining what the service is rather than imagining how that device is going to be configured. And so the plan is really important, but in order to even execute on a plan, then you actually need to know what should be. We talk at Netbox Labs, we talk a lot about what is versus what should be and what the difference is there. And this all comes down to how do you model all these devices? How do you then consolidate it into a place where you can do the planning, where you can see the interconnectivity of all these devices, where you can understand how they have been configured or how they will be configured, and then compare it to what the actual state of play is in the actual network. Because until you’re seeing what is and comparing it to what should be, your intent, then you don’t know if what you’ve actually planned is going to be successful.

Ethan Banks • 16:33

Okay, you said a lot of things there. One is the visualization of the service. So that means going back to modeling, we need to be able to model what that service is to have that point of comparison. For sure. What would that look like? Any specific examples? Am I asking an unfair question?

Bill Lapcevic • 16:49

Well, I don’t know. I think it’s different for everybody. And there are lots and lots of different visualization tools out there that you can use to look at how things are interconnected. You know, we see a lot of people looking at, e.g. , netbox data where they keep all the information that they have about the devices, the IP addresses, the interconnectivity between them, the different components of their network and how they’re all interrelated. And then as they’re making changes to that, they’re doing that within that data model. And then they’re having that sort of reviewed by various different people in the organization to make sure that they’re making changes that aren’t going to be breaking changes, they’re making changes that are going to be the right changes for their intent. This is getting all the device information into a single system so that you have full visibility.

Bill Lapcevic • 17:49

If you have some device information in a spreadsheet and some device information in a Visio diagram and some in InfoBlocks and some in some other tool, it’s very, very hard to get that holistic view that you need to understand how these services all mesh together.

Justin Ryburn • 18:06

I think the original question was: where do you start, right? And I think at the end there, Bill came to something that’s really important: where you get started is figuring out what you have, right? Inventorying what you have in the network. I know a lot of customers that I talk with, and I’m sure, Bill, you have the same experience at Netbox Labs. They’re in a brownfield environment, right? They came in, they inherited, they may not even know, they may not have the spreadsheet, or spreadsheet may not be up to date on what devices do I have, what version of code are they running, what IP addresses are they, where are they physically located? Like, getting all of that data collected and sort of discovering what’s in the network, getting that all to your point in one place.

Justin Ryburn • 18:38

So now I know what I have, you know, physical assets I have, and how they’re currently configured, is where you have to start. From there, you can start figuring out: okay, now I know what I have, I know where I want to go, I know what services are that I want to deliver for the organization, and I can then stitch together how I get from here to there. But I find most customers that I interact with, they’re stuck even just figuring out what they’ve got and making sure that it’s up to date and organized. The documentation is up to date.

Bill Lapcevic • 19:07

Can actually be a really challenging problem. They’re discovery tools. We have a discovery tool. No discovery tool actually discovers everything.

Justin Ryburn • 19:15

Nope.

Bill Lapcevic • 19:16

You know, we have some clients that are in OT environments too. And you actually, there are certain things in your network where if you have an operational technology environment, maybe in manufacturing or something like that, where you actually can’t scan those devices because those devices might be a chocolate extruder from 1980, right? And if you actively scan it, you’re going to break it. And maybe the company that built it is not even in existence anymore, so there’s no one to fix it. And in those cases, you actually have to send people out to those devices and manually figure out. You know, not only what the serial number is on that device, but how it’s configured, what the software is running on it, and then manually put that into whatever system you’re using as a system of record so that you know what’s out there. And so that is a huge and daunting task, especially in a brownfield environment.

Bill Lapcevic • 20:08

It’s why we see a lot of companies pick a greenfield project and start there, because then they can build sort of the collective understanding of how to go about network automation in a greenfield environment where there is no wrong answer because everything you put out there is what you intend, right? And then you roll that slowly into the brownfield environments.

Ethan Banks • 20:28

Nice when you can get it, but Chris, how many greenfield environments do you get to work with?

Chris Wade • 20:35

Not very many. But I do want to recognize, I’m sure everybody in the room is thinking through. A certain amount of complexity that we’re discussing. So when you say greenfield environments, I think the easiest metaphor for that is looking at provisioning some services in a cloud via Terraform CDK, your choice of tooling and infrastructure. When we start talking about automating and orchestrating the types of networks that the people in this room represent, You know, the end-to-end service orchestration is a there’s a certain amount of technical debt that comes across with this, independent of tools. And, you know, it’s it’s usually reserved for our most critical services that we offer our customer, whatever line of business we’re in, just because of, you know, we talk about getting the data right, getting the orchestration right, hooking the tools together.

Chris Wade • 21:28

So I believe it’s reserved for kind of like our most critical services. When we think about what’s on the horizon with Agentic and otherwise, I brought up infrastructure as code earlier. We can imagine rethinking how we run these things agentically 1st . And I do think it’s going to scale out the types and volume of services and operations that we can deal with. So I think we know how to do this. I think most people in the room have done this to a certain degree for those critical services. Things that don’t reach the business value or the ROI or the investment level, we tend to run scripts, we tend to piece together manually.

Chris Wade • 22:09

But when we start thinking about scaling this out, I mean, all the discussion this week. On LLMs in the data center. I’m desperate to use it for our own needs instead of everybody else. And I think there’s great promise that we’re going to be able to leverage this technology to simplify the active orchestration, sometimes use the network as a certain source of truth and to let us not just deliver services better and faster, but scale the volume and types of things that we can deliver.

Ethan Banks • 22:39

So we jumped from how we get started to Agentic AI. That happened fast. To your point, we started with programmability. You brought that up. We brought up the need for discoverability, being able to discover the entirety of the network, so know what we’ve got. And now we’re talking about now that we know those things and we have those capabilities, we need to identify a service, have a model for what that service looks like. And now when we get to Agentic AI, we’re talking about the means by which we would deliver that service, giving us a new ability rather than running scripts, we’re asking agents to deploy bits and pieces and parts of it to bring us that whole service.

Chris Wade • 23:19

For sure. And I would say that a lot of people in discussions think of Agentic as the final step. We often made orchestrated Agentic. If we look how it’s been adopted in other industries, it’s with us from the beginning. If I walk around and I see ChatGPT and Claude on screens as I walk around, which is exciting. But if people are using this at the beginning of their journey, they might use it to write their Python script. They might have an agent that just runs two or three scripts.

Chris Wade • 23:49

It might have an agent that updates ServiceNow or slacks people while they’re operating. So I think I would challenge everybody to try to adopt it early in the cycle and not necessarily a destination. And it’s going to be very similar to how we adopted automation in the sense that I’m going to start with some read-only stuff. I’m going to start with dip sets. I’m going to start with human in the loop. But yeah, I think we should bring it forward and think about it from the beginning. And it helps us reimagine how we’re going to do services.

Chris Wade • 24:17

The most critical services, I would still argue, were going to do how we do today. Full guardrails. Full modeling, full execution. But when we talk about scaling out operations, I think we can rethink a few things, including config bushes that might not be related to those critical services.

Ethan Banks • 24:35

Okay, quick interaction with all of you out here. Agentic AI, that is, if we dispatch an agent to perform some network task, from what you understand of agentic AI as a technology currently, what is your trust level? Are you comfortable presuming guardrails and so on, like Chris was just talking about, that agentic AI will, A, do the thing that you want it to do and not cause you headaches, so that you want to turn this thing on to make you more efficient? Yes, I’m interested in agentic AI. I think it’s there, or it’s going to be there soon. We have a few true believers. And I saw some heads like this.

Ethan Banks • 25:13

And then the opposite, your opinion of AI and Agentic AI and actually using it in network operators like, no, it’s immature, I don’t trust it, or I’m not sure yet, something more like that. Okay, we got about a 50-50 split, I would say, between the two. All right.

Bill Lapcevic • 25:29

There’s probably a middle ground, too, right? Use Agentec AI to understand your network 1st , right? To see what actually is out there. I mean, simple things that we see people doing are, you know, tell me about my IP address allocation in data center A, right? Are there any overlapping IP address ranges in these three locations? Simple things like that that used to take a lot of time to go click around through spreadsheets or click around through databases or click around through whatever tooling you have. That’s a 1st step towards using it to increase the speed and efficiency of your network automation team.

Bill Lapcevic • 26:05

And then later, maybe we get to the point where it’s actually acting on the network. It’s changing things. It’s driving things. And I like what Chris said, for the critical systems, you’re still going to have a human in the loop there.

Justin Ryburn • 26:17

Yeah. Yeah, and I mean, I think you can, like, most organizations, I think, have a change control process, right? Where you’re not making changes to the production network without it being either reviewed or done in a maintenance window. Like, they have processes and procedures around that that they’ve learned from battle scars over the years on what needs to be done for their organization, right? And I kind of see Agentic that way, too. Like, you could have the agent generate the config that you’re thinking about pushing, the change that you’re thinking about making to the network, create a, however you do it, ServiceNow ticket, request the change window, be reviewed by someone in the organization to say, yeah, that actually passes the SNF test. That looks like a legitimate change that’s going to do what we want it to do.

Justin Ryburn • 26:54

Or no, that’s absolutely crazy. There’s no way we want to push that to production, schedule it in a maintenance. You can still do a lot of the processes and procedures that we know are best practices in the industry, and yet have Agentic AI helping us with the config generations and some of the more manual tasks, I think.

Ethan Banks • 27:13

There’s a cost associated with all of this. There’s a financial cost. There’s a time cost. There’s an operational impact because we’re changing our processes if we move in this direction and how we’re leveraging network automation. I need management support for this. I might need to spend. I might need time and some flexibility and just buy-in from the business side to make this happen.

Ethan Banks • 27:34

How do I explain the business benefit of going this route, retooling my network operations to go in this direction?

Bill Lapcevic • 27:42

Yeah, I don’t think you do it by telling how many devices you can reconfigure. I think ultimately what we see is the most successful network automation teams, the most successful projects that we encounter, are tying what they’re doing to, again, to an executive level business value, whether that’s a reduction of risk, whether that’s speed to deploy. I mean, especially in this world of the AI data centers that are getting deployed all the time, we hear things like, we’re moving so fast that we have turbines in the parking lot because we can’t get enough power to our data centers, and we need to move faster and faster. Automation is critical to that. Even just rolling out these new data centers, you can’t be manually configuring all those devices. You have to have a plan, and you have to be automating it. So again, I think tying to speed or risk reduction or even the cohesion between your audit and your compliance people, your security folks, and helping them do their jobs better by providing them access to all this information about your network, I think that’s all part and parcel of getting your project funded.

Bill Lapcevic • 28:52

Go ahead.

Justin Ryburn • 28:52

Oh, I was going to say, there were a couple of really great talks on this topic at the AutoCon 4 last fall that I think a lot of people in this room were at. One done by Jeff Gray and one done by Alex Hinthorne Awane. And what I really liked about both of those talks is to Bill’s point, they were trying to get us as engineers to think in the business terms that the CXO, the CFO, who’s going to green light these projects, are going to think, right?

Ethan Banks • 29:18

Jeff got right into the weeds about how to do accounting style discussions and how to do those computations and give a believable financial report and set expectations so that folks can go, yeah, we’re going to fund this because at the end of this, we should get an ROI.

Justin Ryburn • 29:34

Right. Here’s the actual dollars and cents reason why this, we’re going to get to your point, an ROI, return on investment, which is, I think, you know, being a network practitioner most of my career, that’s not the 1st thing I think about. I think about the last thing I would think about.

Ethan Banks • 29:47

It’s like, this is the right thing to do.

Justin Ryburn • 29:49

Pay for it. Why wouldn’t it make sense, right? But if you put yourself, I think, and this was the point that Jeff was making in that presentation, if you put yourself in the shoes of the CEO or the CFO, whoever’s budget this is, what they’re thinking about, they don’t care about that, right? They have a business to run. They have a budget to look after. They need it presented to them in a way that they can understand what it is you’re trying to accomplish in the terms that they think about it. And if you can do that successfully, and I’ve had great leaders through my career that I’ve watched do this, and like, to your point, I thought I presented a really great case on why we should do this because it just made sense.

Justin Ryburn • 30:25

It’s the right, that’s where the industry is going, and we need to be on board. And like, I thought I had done a nice job of laying it out. And then I had, you know, someone else who would come in and be able to do what Jeff was describing, where they put it in a nice spreadsheet with like, okay, here’s the time we’re spending now. Here’s the time we’ll save. This is what our average hourly rate is for what we’re paying our engineers and be able to present it in dollar and cents. They’re like, yes, we absolutely need to do that. And it got green lit.

Justin Ryburn • 30:47

And I was like, man.

Chris Wade • 30:48

Yeah. Yeah, I think it depends I think we think deeply about technology. We tend to not think as deeply on the business side. So I think it depends on what we’re doing. So if we’re doing automation, we talked about pushing configs. It turns into kind of like how many times I do this times, how long it takes, type of logic. We talked about service delivery.

Chris Wade • 31:16

If I have internal or external customers, it’s the time to value on that service. How many days do I have to wait to get it? That type of thing. When we start thinking about larger concepts that we’re getting to, it’s really how we transform how we operate infrastructure. And I think it’s a little bit business specific and the type of network specific. But I think we can think deeply about what it means to operate infrastructure at scale more effectively. I mean, we’ve seen some presentations in the last day and a half where there’s not a lot of people running around doing things.

Chris Wade • 31:49

There’s a lot of investment in automation and orchestration because it drives the business. And I think understanding that impact.

Ethan Banks • 31:58

Okay, you said transforming infrastructure. Do we speak about that to business stakeholders in terms of, look, these are things we could do if we had this infrastructure and this ability in place to automate service delivery. We do these things today, we can do it more efficiently, but in the future, we could do these other things too.

Chris Wade • 32:19

For sure. A lot of times it’s about maybe us versus how we’re operating infrastructure on our ultimate customers. And if we can ever tie it to the external value, I think it changes the conversation for sure. So a lot of times when it’s automation and orchestration, it’s a little bit more about the cost to deliver those services, the time to value on those services. But ultimately, when we transform how we operate, and there’s things like self-service, or maybe I went in the market because I have self-service, or the time I can turn it around, or the previous presentation, how quickly how quickly ads come out, that’s the product owners and business stakeholders are going to be very interested in that. And I think without automation and orchestration, it’s very hard for us to transform how we operate, especially at the speed of technology that’s happening today.

Ethan Banks • 33:13

One of the things we haven’t talked about yet is integrating what we are delivering as network operations with other teams that are delivering security. So, SecOps teams might care about what we’re doing. DevOps teams might care about what we’re doing. As we move our network automation towards service delivery, holistic view, do we synchronize with those other teams? Is there a win there? You’re nodding your head. You have to say words now.

Bill Lapcevic • 33:42

I’m happy to say words. Yeah, there’s a big win there because, you know, I mean, early on, I used to hear from network automation teams that the security guys would knock on their door and say, I need to know what’s in the network and how it’s all interconnected. And they’d hand them a big spreadsheet, and the security guys would have to go off and spend countless hours trying to figure out from an Excel spreadsheet what was connected to what, right? Because that’s really what they want to know: is if something happens to this, what’s the risk for the rest of the business and what’s the risk for the rest of the network? And so we call it cohesion, right? Once you’ve gotten coverage, you understand what’s in your network, and then you have confidence in that data that what is is what should be. Now, you want to share that information with, again, security teams, or compliance teams, or audit teams, or financial teams, or the ServiceNow team, as an example, right?

Bill Lapcevic • 34:41

Because they’re the ones that are doing all the ITSM workflows and the ticketing. So, why not help them have richer context to what they’re providing? Why not help them by giving them the information about the network so that they can use it for their purposes, even though you remain in control of what actually is in the network?

Ethan Banks • 35:00

Well, the question becomes: at what level of detail? We have different technical expertise, different knowledge domains that we occupy. So how much information are we sharing, or maybe in what way are we sharing information about the network with these other teams?

Bill Lapcevic • 35:16

Yeah, I’ll give you a business school answer. It depends. Although it’s a major area. It really just depends on who you’re talking to, what that other team does, what they need. And I think the right way to look at it is let’s find the right altitude of information, the right level of abstraction. Like you’re not probably going to give the finance guy the configuration data for a router. They don’t care about that, but they want to know other things about that router and what it’s connected to, or they want to know end-of-life information.

Ethan Banks • 35:45

SecOps team might care about that.

Bill Lapcevic • 35:46

That’s right. So you should be able to give different altitudes of information, different types of information to different teams, depending on what their needs are. But then you’re a real positive player in the overall business ecosystem for your company. And that too, back to your other point, how do you get management to want this type of way of doing things, this network automation practices? By helping all these other teams do their job, you become a really important cog in the overall enterprise wheel.

Ethan Banks • 36:20

Chris, a question for you related to this. I’m building a network operations related workflow to deliver some kind of a service, but that service doesn’t exist in a vacuum. It’s going to integrate with security and other IT technology silos, perhaps. Do we get to a point where we are delivering a service that is integrated with the delivery of security that’s a part of that service? So that is, I’m weaving in things that traditionally I might have not have been worried about as a network engineer, but now I can be because I’ve got more programmatic flexibility and workflow automation and so on.

Chris Wade • 36:57

So integration, yes. Tight coupling, no, if I can say it that way. So I think a lot of people hear integration, we start thinking about tying a lot of things together, which kind of slows down the whole mechanism. So I think we have to understand how we work with teams in the more abstract in the sense that are we producers, consumers? Are you making your infrastructure programmable? Am I making my infrastructure programmable? Do I call you?

Chris Wade • 37:24

Do you call me? I think those details are critically important versus one team like Programming somebody else’s stuff. So, I think how these teams work together is probably a lot of my comments come back to kind of how we operate. And I think fundamentally we don’t think through that as much as I have a project to deliver service, or we’re moving SD-WAN, or I have a new pop, and we’re rushing to do that. But it’s like, how do we operate at scale with appropriate decoupling so we can all move fast? I just think velocity is, you know, maybe behind security, velocity and how we operate is probably the most important factor when we think about how we do these teams.

Chris Wade • 38:14

Because we’ve always been isolated and siloed for probably good reasons. But as we think about, you know, we start off with talking about config. Config is task-related. Config is typically domain-focused. When we talk about services and orchestration and operations, that’s typically, we start talking horizontally. So we have to have these teams work together. But we have to figure out how to do it efficiently.

Chris Wade • 38:36

And I think we’ve all seen examples where it works wonderfully in others where those silos kind of are recreated at the service layer, which, you know, I open up five tickets to get my service delivered, which was not the intent.

Ethan Banks • 38:48

Okay, two more topics, guys, and we’re running short on time, but these are important. Telemetry and metrics. Let’s start with the telemetry. So I need to know from the network that the service I’ve delivered is working, is working well. We talked about intent. Is my service existing as I intended? But then there’s also just the general health of the service and so on.

Ethan Banks • 39:08

If I’ve been doing it historically, the way I’ve been doing it historically, I’m using SNMP polling and so on to see network health. I see circuits and devices and interfaces and so on. But how do I change how I’m consuming telemetry to understand the health of an overall service?

Justin Ryburn • 39:26

I think you have to take a holistic view to it, right? I mean, SNMP polling is going to give you some of the information you mentioned, but if you really want to get down to the layer of detail of like what is the traffic composition on that link, right? You’re going to need some sort of flow, S-flow, NetFlow, IPFX, some sort of protocol like that that’s going to get you the next layer of detail. You’re going to need logs, right? You’re going to need to know what went wrong. I mean, you need all, I’ll say all of this to limit. I know that seems like a little bit of a cop-out answer, but you can’t just rely on one protocol to understand how the network is operating and how it’s performing and whether the service that you’re trying to automate has actually done what you thought it did.

Justin Ryburn • 40:07

And I think kind of the conversation that Chris was just having about siloed teams, you can’t have siloed collection of that data either. They either have to work together to where there’s a programmable interface between them so that you can correlate the data or you need it all in the same, for lack of a better term, a data lake, or you’re going to have to, in these days, run AI on top of it that can look at logs in your log collection platform, metrics in your metrics collection platform, flow in your flow collection platform, and be able to see, okay, at the same time this traffic pattern changed, I had a spike in CPU and I had a log message told me my VGP session went down, right? So that you can more quickly get to root cause. You’re going to have to have all that data at least being able to be collected, whether it’s AI doing the front end of it or a human doing it.

Ethan Banks • 40:56

Well, one big thing is we need to be able to deal with a larger volume of telemetry, perhaps.

Justin Ryburn • 41:00

And it’s only getting bigger.

Ethan Banks • 41:01

Yeah. Yeah. Yeah. Bill, anything to add to that?

Bill Lapcevic • 41:06

Yeah. I mean, I think I’ve called out what is and what should be. That’s kind of static, right? I think what the telemetry does is it lets you know how well your plan is operating in the network so that you can make adjustments to your intended state of the network. So I don’t have a lot of things to say about the type of telemetry, but seeing real-world actionable data coming in, how is your network performing, informs how you then plan for the next iteration of your network.

Ethan Banks • 41:43

All right, closing question, guys. How do I measure success? That’s the metric, the key metric, I think, for all of this. How do I know that my network service automation is working well? Chris, you got thoughts on this? Tough one.

Chris Wade • 41:59

Yeah, that’s the easy one, right? I mean, if we’re thinking about Changing how we operate. It really comes down to how many changes in our infrastructure are on purpose and driven through the processes we’re talking about. We talked about the business outcomes, but as far as OKRs and KPIs within our teams, I think it’s how much stuff we do, how fast we do it, and how efficiently we do it. And that’s going to translate into mostly.

Ethan Banks • 42:35

Well, our ability to deliver services is a metric.

Chris Wade • 42:39

For sure. I think if you think about your network, especially for the people in this room, one of the most important assets of our business, we think about the investment in that every time we change our infrastructure, there’s business value created, or we wouldn’t do it, whether it’s maintenance, software upgrades, or service provisioning more actively. So I argue the more times we change our infrastructure, the more value creation we are creating for the business. So because we don’t do these things for no reason, typically we don’t do the reason. So I think measuring what we do, how we do it, and how effectively we do it is the core value that can be translated to everybody. Some of it very critical, like a customer gets self-serve provisioning, and other things we’re doing maintenance or implementing security so we run a more risk-free, tolerant network. But every change is important.

Chris Wade • 43:29

And I think measuring change, change effectiveness, and time to value are the critical components.

Ethan Banks • 43:35

Justin?

Justin Ryburn • 43:36

I would say it has to tie back to the business outcomes though, right? I agree with everything Chris said, but those whatever KPIs we as a networking team are going to measure our changes on, how effective they were, how many we did, needs to tie back to the key business KPIs. And I’m going to assume for most businesses that is SLAs, network uptime, service uptime, those type of things, right? That is really the ultimate outcome of any type of automating or orchestrating or operating a network.

Bill Lapcevic • 44:05

Parting thought, Bill? Yeah, there are two levels. And I think we’ve talked a lot about the levels that the network automation team needs to measure for itself. But I want to just bring it back to also then tying it into the executive level business value. How much faster are you rolling out that new project? How much faster are you able to get the new data center online? How much more effective?

Bill Lapcevic • 44:28

How much more cost efficient? What’s the, you know, how many manual changes are being made? And tie that into the amount of risk that the business is taking when you make a field level change that wasn’t immediately approved and don’t have a way of remediating it. So I think if you can tie it again to that executive level business value, that’s equally as important as all the day-to-day metrics that you need to have as KPIs for your team.

Ethan Banks • 44:56

Okay, let’s give our panelists a hand. And Nanog, whoever our handler is, do we have time for questions? We do not have time for questions. Okay. I’ll win track. Yep.

What the Panel Explored

Why configuring devices is the means, not the end -and how the shift to service-level thinking changes how automation programs are built and justified.
The “what is vs. what should be” problem – how inventory, discovery, and modeling are prerequisites to everything else, and why brownfield environments make this harder than most teams anticipate.
How to talk to the business – translating network automation value into the ROI, risk reduction, and speed metrics that actually get projects funded.
Agentic AI as an early-cycle tool, not a destination – why teams should be adopting it now in constrained, read-only, augmentation-first ways rather than treating it as the finish line.
Cross-team integration without tight coupling – how network automation intersects with SecOps, DevOps, ITSM, and compliance, and why decoupling matters more than integration.
How to measure success – the KPIs that matter at the team level and the executive level, and how to connect the two.

If you can’t log into every device and manually configure them, you’ve got to automate. But it’s not just about point scripts – it’s about thinking holistically about the system and what you’re trying to accomplish.

— Justin Ryburn, Field CTO, Kentic

Every change is important. Measuring change, change effectiveness, and time to value are the critical components.

— Chris Wade, CTO & Co-Founder – Itential

From Device Config to Service Delivery: A Mind Shift

Network automation has always been about more than the config push. The config is just the mechanism. The outcome is a service – whether that’s an L3 VPN, a data center fabric, or the checkout system that a business executive actually cares about. The gap between how network engineers describe their work and how business stakeholders think about it is where most automation programs stall.

The panel explores how teams can bridge that gap: not by dumbing down the technical work, but by learning to speak in the abstractions that executives, CFOs, and application owners actually think in. Speed to deliver. Risk reduction. Competitive advantage. Self-service. These are the terms that get projects green-lit – and they’re the same terms that the best automation teams use to measure their own success.

Agentic AI: Adopt It Early, Not as the Finale

There’s a tendency to treat agentic AI as the destination of the automation journey – the thing you graduate to once you’ve mastered scripting, platforms, and orchestration. The panel pushes back on that.

Chris Wade argues that agents should be present from the beginning of any automation initiative: writing scripts, updating tickets, handling read-only queries, surfacing exceptions. The parallels to how teams first adopted automation are hard to miss. You start read-only. You add human-in-the-loop controls. You expand from there as confidence builds.

The audience at NANOG was roughly split 50/50 on their trust in agentic AI for production use today – which is probably the right level of healthy skepticism. The question isn’t whether to adopt it. It’s how to adopt it in a way that doesn’t sacrifice the guardrails, governance, and change control discipline that infrastructure teams have earned from years of battle scars.

The most critical services, I would still argue, we’re going to do how we do today – full guardrails, full modeling, full execution. But when we start thinking about scaling out operations, we can rethink a few things.

— Chris Wade, CTO & Co-Founder, Itential

Take a Deeper Dive into the Discussion

Episode Notes

(So you can skip ahead, if you want.)

00:00 Introduction
01:23 How to Model a Service
06:39 Updating a Service Model
09:59 Overview of AI Agents
13:35 How Agents Acquire Task Knowledge
18:04 Builder vs. Operator Personas
26:03 Building Guardrails for Agentic AI
33:05 Quantifying AI Reasoning
43:19 Advancements in Data and Telemetry
52:51 Final Thoughts
View Transcript

Ethan Banks • 00:00

All right, so with me is Chris Way. Chris, here, what is your title at Itential?

Chris Wade • 00:04

As CTO of Itential. I focus on the product portfolio mostly.

Ethan Banks • 00:08

Yeah, right, right. And I got Justin Ryburn with Kentic. You are your CTO. Field CTO. Does that mean you’re the CTO in the field?

Justin Ryburn • 00:15

Yeah, exactly. But the difference in what Chris does, I do a lot of like public speaking engagements. I’m kind of an ambassador for the brand out in the field with the sales teams. Right, right. Running engineering and product organization. Gotcha. All right.

Justin Ryburn • 00:27

And then Bill, Bill, I don’t even know if I pronounce you right. You’re last. Lepcevic. Lepcevic.

Bill Lapcevic • 00:30

Okay. And you are? I’m the co-founder and chief revenue officer at Netbox Labs.

Ethan Banks • 00:36

Netbox Labs. Okay, so I got three guys with me here who we’re all in the automation space, is really automation, orchestration, or related in some way or another. Kentic’s telemetry. You guys are doing the orchestration stuff. You’re single source of truth. How would you define it? You know, DSIM, you know, with all that kind of stuff.

Bill Lapcevic • 00:52

It touches on a lot of that.

Ethan Banks • 00:53

Yeah, yeah, a lot of that kind of stuff. Okay, so the context of our conversation, guys, we had a discussion from the stage at Nanog96 here in San Francisco, and we wanted to follow up on that discussion. Our premise was network automation isn’t just about configuring network devices. And so from the stage, we talked about a lot of what that means. This is a follow-up to that discussion. So if you’re out there watching us, you need to go watch that conversation 1st . And then we’re filling in some blanks that we had to leave for sake of time on the stage.

Ethan Banks • 01:23

And the 1st thing I want to get into, guys, is let’s describe how to model a service. We were making the point that, okay, you know, we’re not just configuring devices. We are building a service that happens to involve a lot of devices. And there’s going to be a model that helps us describe what that service is. Can we come up with a simple example of what a service might be and then what that model looks like? Are we talking about defining it as Yang with a structure or a JSON blob or something? Or are there models out there that I can use?

Ethan Banks • 01:54

Help a network engineer understand what we’re getting at when we say modeling a service.

Chris Wade • 02:00

Me 1st . Chris goes 1st .

Ethan Banks • 02:01

Okay, go ahead, Chris.

Chris Wade • 02:03

So historically, we’ve used a lot of data models, like you said, with Yang and other things to model a service. But ultimately, depending upon the types of networking, what we talked about on stage was the fact that. Each domain kind of has their own level of automation, and typically services are traversing multiple domains. So, how you’re going to model it is kind of dependent upon how you’re using.

Ethan Banks • 02:22

So, define domain in this context.

Chris Wade • 02:24

So, a domain could be data center, it could be WAN, it could be cloud, could be security. So, if your services or even with you know EVPN within a data center, so you could be you could be traversing multiple kind of aspects within a domain. So, typically, within a single domain, there’s models, whether it’s meth services or otherwise that have defined service models for those things.

Ethan Banks • 02:46

Things like interfaces have a model, and then BGP has a model, and so on. So, there’s a lot of little teeny or very narrowly scoped models, but we would be using those models to build a bigger service.

Chris Wade • 02:57

100%. And then, really, when you start to get into the service, the reason we talk about it this way is typically you care about managing the life cycle of such a service because you build a service for some end user, whether it’s internal or external. So, to me, a service typically is going to traverse multiple devices like we talked about, and it’s going to have some process and life cycle to it.

Ethan Banks • 03:17

Because if we’re going to life cycle, I got to build the thing, it’s going to live, and at some point I’m going to delete that service. I wouldn’t need it anymore.

Chris Wade • 03:23

Correct. Create, modify it, delete it, delete it. For the purpose of us discussing a service, typically it means that it has some life to it, whether it’s short-term, long-term.

Ethan Banks • 03:31

Okay. Okay. So, then a model is going to be made up of a bunch of smaller models, narrowly scoped models that help. Bring that service into life in our network devices. So, there’s a couple of things that I have seen happen with models. One is there’s a lot of pre-existing models. They could be open config, they could be Aang models, they could be some other structures that I can just leverage and use those.

Ethan Banks • 04:00

Oftentimes, those models don’t map to exactly what I’m looking for within my company. And so, I got to do some customized kind of things, which actually Netbox Labs caters to.

Bill Lapcevic • 04:10

Yeah, no, that’s true. One way we think about services is, e.g. , if you have a certain type of branch office that always gets deployed in a similar way, that’s a repeatability thing, right? It’s a repeatability service. And you can model that in Netbox Labs. You can use some of the design capabilities and the planning capabilities within Netbox itself to kind of pre-configure all the interconnections and the configurations that you need. And then you can roll it out much more quickly across lots of different branches that are largely the same, make tweaks to that. But for us, it’s a repeatability thing.

Bill Lapcevic • 04:53

It’s a speed of deployment thing. It’s been pre-templated. Is that pre-templated? Yeah. You can template devices, but this is actually a higher level of abstraction where you string a bunch of different devices that are templated all together to create a service that you can then roll out. And if it’s, I’m going to use an example just off the top of my head. Like if you were Starbucks, you have a pattern that you always deploy for your network at a given Starbucks restaurant, Starbucks coffee shop.

Bill Lapcevic • 05:23

And you don’t want to have to go and manually configure and manually string everything together each and every time. So you create that layer of abstraction and roll it out that way.

Ethan Banks • 05:31

And that layer of abstraction means you can generically define a service that can work on multiple different kinds of hardware. So you don’t necessarily need to know the specifics of how an interface comes to life in a Juniper box versus a Cisco box because you could be using a model and then there’s going to be a layer that can interpret that and then push the specific changes down into those devices. Is that what we’re talking about with models as well?

Chris Wade • 05:56

Correct. And one other aspect of service is typically you’re doing it for somebody else. So, you know, there might be an application team. It might be an external customer, just depending upon what type of business you’re in. But you’re typically modeling these services with the lifecycle we talked about so somebody else can either request it, modify it, or ultimately delete it. Because a lot of times with automation, we’re focused on like the activities we’re performing, right? And generally, we’re doing this as an abstraction.

Chris Wade • 06:21

So it’s easier for somebody to request the intent of what they’re trying to achieve without the details of sometimes what we have to deal with.

Ethan Banks • 06:28

So I will have built a network service within my organization and created a model for it so that I can deploy that service in a repeatable, predictable way. At some point, I’m going to need to update that model. I’m going to need to change. There’s going to be some attribute that’s new or important, and I got to add something. What happens to the services that I’ve already deployed on the old model? Is there a process for that?

Justin Ryburn • 06:51

Yeah, I mean, that’s where it gets really complicated, right? Because you’re either going to have to go back and reconfigure everything to the new model, or what some customers will actually do is they’ll have an old architecture and a new architecture, right? And so like maybe, you know, the existing data centers, they don’t change the service model. They, in the new data center or in the new rack, when they build the new rack, they adopt the new model. And as the services or the application services that are riding above that migrate to the new network, to the new data center or the new rack, they just deprecate the old, right? But yeah, one way or the other, if you’re going to change the template, if you’re going to change the service, you’re going to have to go back and change the network configuration to match that, which is part of the advantage of automation, right? If you did a nice job of automating it from the beginning, when you change the service, in theory, it shouldn’t be that hard to go back and change the actual configuration that’s running in the network.

Ethan Banks • 07:42

I mentioned Yang models and open config models. Are there other models that people could go out and search for to have at least a starting point to start building some of their own service models? Any recommendations you guys have or what you see your customers doing? You’re not allowed to look blankly at me. Come on.

Chris Wade • 08:01

I mean, you know, I think when you bring up Yang specifically, what I’ve seen over the past couple of years is, you know, it’s really a controller to network interface of infrastructure. When we start talking about modeling in the more abstract way, if you’re talking to cloud folks, it might be HCL or Terraform. If you’re talking, it could be CDK, it could be some of these other. So I think. The closer you get to traditional networking technologies, I think Yang has the benefit of being tailor-made for what we do. In the more abstract, and especially as we get into cloud and other stuff, you’ll see JSON schemas, you’ll see HCL, you’ll see all these different technologies that are. So if we’re modeling API calls, as an example, API calls have JSON data payloads with schema.

Chris Wade • 08:46

The schema is in JSON schema, so it makes more sense to build the models in JSON schema, you know, just to be technical for a few minutes. So I think it depends where we are kind of in the types of models we’re building for which domain. It’s kind of right tool, right job type of situation.

Ethan Banks • 09:01

Yeah, and it’s funny. Yang, years ago, we were talking about, man, I hope what happens with Yang isn’t what happened to SNMP, where everything ended up living in the private enterprise MIB and everybody’s was just building proprietary OIDs out there. That’s exactly what happened with Yang, pretty much. So that’s the story. So I don’t know, as you were pointing out, Chris, I don’t know how useful Yang is to the average network operator as a person doing things unless you’re developing and writing to a specific model.

Chris Wade • 09:30

Yeah, something like that. Depending upon how much we want to go backwards, but at the time, we didn’t have controller infrastructure. We didn’t have a lot of this. It was device-by-device configuration all the time, every time. And I think Yang gave us the benefit of attempting to do that at scale with a lot of rigor and guidelines and control. So much as data center. Yeah, so much rigor.

Chris Wade • 09:52

But you appreciate the rigor when you go through a controller and you can trust that as it pushes something out to 2,000 devices, it’s going to do it in a certain way.

Ethan Banks • 10:00

All right. We also talked about workflows of agentic AI. Agentic AI is a bit of a buzzword. I don’t know that most of us in the network engineering world have a handle on exactly what that means. Can we take a step back and just define what the heck an AI agent? Is it code? Is it running on a server?

Ethan Banks • 10:17

How is it different from a script? What is this AI agent thing? Can we start there and then talk through how employing agents is going to impact our workflows? So, who wants to take a stab at defining an AI agent? Oof, that’s a tough one, but I’ll try.

Justin Ryburn • 10:33

I’ll take it around and take a 1st stab at it. When I think of the term agentic AI, I think of it being based on LLMs, either open source ones that people are using from somebody like Open AI or ones they’ve built themselves, right? But it’s basically built on a trained large language model. Some kind of a foundation model. Some sort of a foundational model. And then it’s actually doing, when I think of agentic, I think of like doing multi-part reasoning, not just doing like a single shot question and an answer, right? That’s what I think of as being like true agentic is it’s if you’ve ever played with chat GPT or Jim and I maybe is a better example, you can go in and do like the simple mode, right?

Justin Ryburn • 11:15

The fast mode, the fast mode as they call it, or the thinking mode, right? That to me is the difference between agentic and non-agentic. The thinking mode where it does multi-part reasoning, and then if you’re lucky, it’ll even show you the steps and how it got to its final answer is what I think of when I hear the term or when I use the term for that matter, agentic AI.

Chris Wade • 11:33

Okay. Chris, you want to add to that? Yeah, I do agree. Most people’s experience is an NLP interface like ChatGPT, et cetera. And I think a lot of the. A lot of our experiences started there. There’s some frameworks out there.

Chris Wade • 11:46

LangChain, LangGraph is probably the default people, the one people start with. But you ask, what is it? I think you should think about it as just a software process. It has a library, just like we use a library to connect to a database. We’re using a library to connect to the LLM. It understands how to do that interaction.

Ethan Banks • 12:06

So a piece of software code has the ability to interface with the LLM and ask it. What? Not a natural language question or not a natural language question because we’ve deployed it to do some sort of autonomous activity.

Chris Wade • 12:20

So as it reasons through the logic, it’s going to interact with the LLM to perform the reasoning. So depending upon how you built the agent, let’s say that not take a step back, but maybe people are using tools and skills, which might be an MCP server with Kentic and Netbox Labs. So I would basically have an agent that says, I want to provision a VLAN. Your job, you give it some prompting, just like you would do in Gemini. Your job is you’re a VLAN provisioning agent. You follow the following steps. So it’s going to reason through how to do that.

Chris Wade • 12:51

And it’s going to say, hey, I want to get an IP address. How do I do that? And if it has the MCP tool to say, I’m going to go get an IP address.

Ethan Banks • 12:57

MCP model context protocol, basically, I’m going to ask MCP for a tool that can give me back this information.

Chris Wade • 13:03

Exactly. And the nice thing in the guardrails that are built into it is that it can only do the things you’ve provided access to, right? So it’s not jailbreaking your infrastructure. You say, hey, you can talk to Netfox Labs and you can ask for IP addresses. I can ask Kentik, you know, what the health of the device is at a given time if I’m trying to make a decision. And then because I’m trying to provision a VLAN, I’m going to attempt to do that, right? So I’m going to ask maybe the LLM how to, what’s the format for this thing I’m trying to do?

Chris Wade • 13:27

I might go get the IP address to do stuff and I’m going to have an output that I think is achieving the request that was made of me as an agent.

Ethan Banks • 13:35

How does the agent know how to ask this stuff? How have I instantiated it with that level of pre-built knowledge?

Chris Wade • 13:43

So some of it’s going to be what you programmed in, that we talked about. So just like I would write a Python script and you would have certain logic. You’re going to program certain logic into the process that’s doing it. And it’s going to have those libraries so it knows how to take your program and interact with the LM directory.

Ethan Banks • 14:00

And you say you, like the developer. Well, that’s my next question. Is it a developer or is it a network engineer that is writing or creating an AI agent?

Bill Lapcevic • 14:09

I think it just depends on what you allow. And I’ve been known to go into Netbox and run AI queries against Netbox to find out interesting information. And that’s effectively. Using natural language and going through the Agentic AI interface, and it goes out to the MCP server and it pulls all the information and assimilates it and gives me a response. So, the hope is that it’s not only a developer that can do this, right? That it becomes a network engineer or a business person. One of the things we talked about in the panel was: how do you get your sort of the senior management or the business people interested in understanding the network and understanding what you’re trying to accomplish with the network and how well you’re doing?

Bill Lapcevic • 14:53

One of the ways to do that is to give them access to AI tooling that’ll allow them to not have to click around in a Postgres database to see what a network configuration is on a router, but instead ask really interesting questions about how data center A is operating versus data center B that has the new architecture and give them that completely on its head and look at the help desk person, that persona who I’ve got.

Ethan Banks • 15:20

Six help desk tickets that just came in from Boston. What is going on in Boston? And then, you know, the AI comes back, deploys agents can figure it out. Here’s what we think is going on. There’s a router. We’ve got a circuit down. The failover circuit is completely congested.

Ethan Banks • 15:34

And here are your possible solutions. Give them some kind of answers like that.

Bill Lapcevic • 15:38

I mean, we talk about adding context to things all the time. And AI is really good at going in and figuring out there’s a trouble ticket for this type of device in this location. Go give me all the context around that that might be interesting, any anomalies, anything that you’re seeing, that a human would have to then go and click around and figure it out. But an AI agent properly set up will easily be able to add that context to the ticket and make it actionable.

Ethan Banks • 16:04

Because the agent is going and knows what tools to summon so that it can query the network in real time and pull back statistics out, run a show command, parse it, know what that information means, and then feed it back. And then how am I getting the information back? The agent is feeding it to the LLM that I’m using to put a natural language query in. Yeah.

Justin Ryburn • 16:26

I mean, that’s really where MCP and agent agent A to A come in, right? That’s the whole advantage that those neuro protocols are bringing to this whole thing is being able to do that kind of tool chaining, right? To be able to have, I’ll just pick a vendor, write a ServiceNow ticket, write in the ServiceNow ticket, be able to say, hey, to your point, we got six tickets that came in from Boston. What the heck’s going on in Boston? Hit Kintix AI agent, pull down traffic telemetry, go out to iTential, see if anybody made a configuration change where any, you know, where any configuration changes pushed through iTential during that period of time, pull context out of Netbox and be able to take all that together faster than a human brain could do that and say, okay, during this period of time, here’s all the things that took place across all of the tools that I have in my environment. And here’s the most likely cause of that. And I’ll take it a step further.

Justin Ryburn • 17:14

What I’m hearing from customers they’re really wanting to do is instead of getting six tickets from the very beginning, is having AI be able to do that pre-correlation before the ticket actually fires. They only get one ticket at the help desk instead of six.

Bill Lapcevic • 17:26

Yeah. Yeah. Okay. Well, the other important piece to this is it’s all based on having the model built. Because if you don’t have the model and the structure, then the AI can’t get the context. If it’s a bunch of random things in a bunch of random places, a lot harder for the AI to assimilate that information.

Ethan Banks • 17:43

So, okay, this is actually important because this becomes part of what my back-end infrastructure looks like. So, we’re not talking about just a foundation model like Gemini, whatever, or Llama. We have that, but then we need to augment that. Are we talking about using RAG to teach that model about our specific environment?

Chris Wade • 18:02

Maybe just take a half a step back in the discussion. I just want to make sure. In the previous discussion about how agents work, I always like to draw the correlation to automation because it’s something we’ve worked through over the past 10 years to think about like, what does it mean to automate things? And I still think there’s a role, there’s a builder persona role who’s going to build these agents, and then there’s an operator role who’s going to interact with these agents. So when we’re asking queries and all these things, so when you start to think about RAG, as you brought it up, it’s really going to be a builder persona. So I’m building an agent to do something, right? And the foundational model has the understanding.

Chris Wade • 18:36

I usually just say it has the understanding of the internet, right? It is the most generic trained model. So if I need to augment it with something specific, like maybe the general model doesn’t know my data center fabric’s not costs, or maybe it doesn’t understand the type of business I’m in. Maybe I need to augment that so the quality of the data is being better received by somebody who’s making these queries. But the other nice thing is as we automate things, we’ve typically had to be very specific with what we wanted to automate. And you had a library of things you could run. Once we build these agents with NLP on the front end, we have much more flexibility on the types of queries we can do.

Chris Wade • 19:13

So, you know, speaking for Bill, instead of having to build as many dashboards for all the data, I can build a generic agent interface that can create maybe the dashboard I’m asking that the builder did not envision. Right. So it kind of decouples like what I have to build from the flexibility of the user to access it, if that helps. No.

Ethan Banks • 19:32

Yeah. I mean, is that is that I mean, is that real today? Because a lot of that’s so forward-looking. It’s like, yeah, it kind of works today, but about 75% of the time it sucks.

Chris Wade • 19:41

And which part?

Ethan Banks • 19:42

Create the thing that I want, and you know, it creates, and you’re like, oh, well, kind of. It was almost there.

Bill Lapcevic • 19:48

Well, it has to be iterative, right? Like it’ll create something and then you have to refine it and you have to say, okay, that’s close to what I was looking for, but go do it this way instead. Don’t do the one and done prompt then.

Justin Ryburn • 19:58

Right.

Ethan Banks • 19:59

Okay.

Justin Ryburn • 19:59

And I, I mean, some of that comes is really not that much different than the way you do a search on Google years ago, right? Where like you don’t do a good search query and you’re going to get terrible results, right? It’s a little bit more advanced here, but what we’re really talking about is writing good prompts, right? The better you write a prompt, ignoring rag for a 2nd , but the better you write prompt, the better your answer is going to be, right? You ask the right question, you’re more likely to get the right answer is what I find with a lot of these. It’s really about giving enough context to your question when you ask the prompt to get back a reasonable answer.

Chris Wade • 20:32

But I think something you brought up there is like, is that real? You know, is 75% of the way. So it’s, I think a good question as we’re solving these problems is: what are the foundational models good at? What we found is they’re very good high in the stack, meaning they’re very good generically. They understand what it means to provision a VLN. They understand what a software upgrade is. They understand what a EVPN is.

Chris Wade • 20:53

They might not understand the details of a mixture of Juniper and Arista data center fabric, like the implementation, the version level.

Ethan Banks • 21:00

I do research with foundation models for things like I’m doing and preparing a podcast that is the fundamentals of OSPF routing. You know, give me these things. And at the high level, it’s good. You drill into things like, okay, I need a list of all the LSA types. And it trips all over itself. It gets some of them right, but some of them wrong. Confidently wrong, but wrong.

Ethan Banks • 21:22

Yes.

Chris Wade • 21:23

Exactly. So it’s like we want AI to be a lot of things, but I think we have to, we’re building kind of around its capabilities, right? So we have to figure out what it’s good at and understand where it’s going.

Ethan Banks • 21:32

But this is the skepticism that network engineers have around this technology. We’re working in a domain where we can’t get things wrong, nor can we trust a process that gets things wrong. We need that deterministic answer. We need it to have some level of predictability. So are we working? Around the technology is that if we do work around those limitations, because an LLM, by definition, wants to give you a non-deterministic answer. So it feels more human.

Ethan Banks • 22:03

But we want it to give us the same answer every time and a trustworthy answer. Again, going back to network engineers and their mindsets, this is the skepticism that people have. I don’t want something unless I can 100% rely on the thing and it’s just hype and I wish they’d stop selling it to me. Rant over.

Justin Ryburn • 22:18

I guess I would challenge that by saying your coworker that was manually provisioning the network or running a script to configure the network, were they deterministic? Did they do the same thing every time without error and without making mistakes? I don’t think that they did. Right. So I don’t know. I mean, again.

Ethan Banks • 22:35

This is the self-driving car argument. The self-driving car is going to kill fewer people than actual humans.

Bill Lapcevic • 22:42

Back in the day when IBM had run books, where you’d have a problem and you’d open up the runbook. And the dream was to automate that, right? The dream was to have self-healing systems. But you know what ended up happening is somebody always was sitting there saying yes or no and actually pressing the button to take action. And the reason is that they never were able to get to that nirvana of being able to 100% of the time trust anything. And I think we see the same thing in every aspect of networks and technology. You always have somebody who needs to actually take a look at it and make sure before you execute it, it’s the right thing.

Chris Wade • 23:25

Just two comments on that because I love the skepticism. So I will say in the near term, like what we’re doing in our portfolio with FOIA is we put the determinism between the agent and the network, right? So you do have determinism between. So the agent can only do the things you allow it to do. So if you can run a Jinja template, if you can open up a ticket, if you can send a Slack request, you’re constraining it by what it can.

Ethan Banks • 23:45

Which was a bit of a loaded question on my part. I’ve had other conversations too, where that, yes, it’s true that an LLM is not deterministic, but you can, with the correct guardrails and inputs and so on, get a deterministic answer out of it. So, and I think that’s something that a lot of engineers don’t realize as yet.

Chris Wade • 24:02

No, for sure. But as we go forward and as forecast our tokens will go down by 95%, context windows are going to be 100x. Like you can see on the horizon with the improvement, using it high in the stack, using it for some of the reasoning so I can do more things and I can do less of the data manipulation and less of the error handling, use it for what it’s good at. And then as it continues to improve and as we augment LLMs, the foundational models with small language models, hopefully from our vendors that are giving us expertise out of the box so we can start to leverage that knowledge, it’s a good future, I think.

Justin Ryburn • 24:40

Yeah. That’s the way we’re approaching it too. Like to Bill’s point about the playbook, we call it run books in our product. Basically, consider it rag if you want, where you can put in a runbook or a playbook that says, Hey, when I get an alert that there’s a BGP session down, this is what I would have trained my operations center to do. These are the steps. You go and look and see, you know, when did it go down? You check to see if there are interface errors.

Justin Ryburn • 25:03

You try bouncing. Well, this is bad example because we don’t bounce interfaces. We don’t change the network configuration, but you know, see if the interface was changed, if a configuration was changed. So, anyway, whatever the steps are that you would do to troubleshoot that your operations team would do to troubleshoot that BGP session, you can create a runbook in our product that would basically tell it those are the steps you take. So, it’s basically augmenting the LLM saying, no, don’t just do whatever you want to do with this alert. Follow the process that we have agreed as an organization is how we troubleshoot this. And it actually makes.

Justin Ryburn • 25:35

At least in our experimentation, in our product, it makes the outcome much more closer to deterministic. It’s still a little bit different. Like I could run that exact same AI against the exact same alert five times and the output will look slightly different. It’ll go through the same steps, but it may do it slightly different. It’s not truly deterministic in the way that a script would be because it follows an if then else logic, but it’s roughly getting to the same outcome at the end of the day.

Ethan Banks • 26:02

Okay, so guardrails are an important part of this conversation. And I think understanding how guardrails work with Agentic AI is going to help us adopt this because we’ll know we can trust it if we can put those guardrails in place. What does that look like? It used to look like RBAC. It used to be, I’ve bounded this user, can log in and run these commands and it’s audited and I have an audit trail that I can fall back on and see what happened. What does guardrails look like? I mean, is it just RBAC applied in a different way?

Justin Ryburn • 26:32

I’ll just kind of speak to the way we’re doing it in the Kintick product, which is, yeah, we have RBACs. Only certain users can even use the AI part of our product. And then if the AI agent is going to log into the device and run show commands every single time, the user has to say, yes, go ahead and do that.

Ethan Banks • 26:51

But the user’s bounding what the agent is doing. So it’s human in the loop that’s doing the guardrails. Right. So it’s basically coming up. As opposed to the agent itself acting autonomously, but being watched by the RBAC process, like, you know, as if it were. It’s a human watching the AI. It’s not RBAC, radius, whatever the enforcement mechanism is.

Justin Ryburn • 27:10

Well, presumably the customer has RBAC and radius on their device as well, so that the user that we’re logging into the device as, right, can only run certain commands. So you could lock it down even further to only allow, you know, that login to that device, because it’s still logging into the device, at least the way ours works. It’s SSH into the device and running the show commands, right? So you can lock it down and say, you know, you’re only allowed to run these, you know, narrowly defined commands, but then also the AI in our product, it’ll basically come back and say, in order to troubleshoot this, I need to run these three show commands. And the human has to say, yes, go ahead and run those show commands. And the reason we chose to do it that way, which you guys have probably all experienced this too, is it’s not unheard of for a vendor to have a show command that has a bug and causes an outish, right? Causes, I mean, I can remember, I won’t pick on the vendor, but you used to do show us BF neighbor in one of my favorite routers, and it would reboot the router.

Justin Ryburn • 28:06

Like they had a bug in their software that fixed it ultimately. The answer to your question is there are no neighbors. There are no neighbors because the box is rebooted, right? So it’s not unheard of for a show command to cause impact. So we wanted to put an extra guardrail in there in our product. That’s how we’re handling it, but I don’t know what Bill or Chris, they got a different answer to that.

Chris Wade • 28:25

Guardrails cool. Well, there’s the enterprise features we’re talking about, single sign on RBAC, audit, all that kind of stuff for sure. The most important guardrails is kind of the part we’ve already talked about, which feeding it deterministic logic, which is the only thing it can do. So if running a pre-check twice is a negative thing, then we need to talk about it. But you can constrain kind of what it does. But also, in addition to just the audit of how things work, you can also audit the train of thought of your agents as they run. So let’s say back to the VLAN provisioning agent, it goes and gets an IP address that’s working through.

Chris Wade • 28:59

You can see the actual train of thought and do diff sets just like we, you know, I just see, I see all these correlations to automation in my head. You know, the 1st one’s a read-only. The 1st one’s human watches every diff. You know, there’s all this kind of like we can put the human processes around it for like adoption and confidence within our environment and making sure that we have the right steps in place. But you get full train of thought. You can see exactly what happened. And I think a big part of this is human in the loop, on the loop, at the loop, which mirrors automation to the T, right?

Chris Wade • 29:31

Before we ever let automation run a thousand times overnight, we did testing, we did validation, we had humans approving things for some period of time for this to happen. So I feel like the operational paradigm of this is going to be very, very close to what we did.

Ethan Banks • 29:45

It’s like a maturity thing.

Justin Ryburn • 29:46

Maturity approach.

Bill Lapcevic • 29:47

Will this mature? Sorry, but go ahead. I agree with the maturity aspect of it, and I think it’ll. It’ll get better and better over time. People get more used to it over time. And you’ll end up with new processes and new ways of thinking about the AI that will change how we operate.

Ethan Banks • 30:05

Is the goal autonomy? Do we get to that point with AI and the way it is moving where we graduate from human in the loop to this AI agent can just go do the thing? We believe in it.

Justin Ryburn • 30:20

I would say that’s the promise. I don’t know. Like, I think every organization is going to have different levels of comfort with how quickly they’re, you know, willing to do that, want to do that. So I don’t, yeah, it’s going to, it’s a hard question to answer.

Ethan Banks • 30:35

Like, we’ve been asking the question forever in different contexts, right? In the automotive space, it was like, well, when do we go out and auto-remediate the thing we detect the thing? Well, never probably. But with AI, it feels like, well, with reasoning and guardrails, it’s more human-like in how it behaves. You know, do we, do we get, are we, are we finally there? But this is the technology platform where, at least for certain bounded situations, we can allow the computer, the robot, to act autonomously to remediate a situation.

Justin Ryburn • 31:04

I mean, again, I think it depends on the organization. There are some organizations that have gotten to a level of trust with automation at least where they’re doing that. I mean, Chris and I have a joint customer that’s detecting DDoS attacks in Kentic and mitigating them with high tension, and it’s fully automated. Like they didn’t start there, but they started with a human in the loop and they’ve now gotten to the point where they trust it enough that it’s done it enough. Human just said yes so many times, like, okay, let’s just take the human out of the loop and let it be fully automated. Good. And very somewhat narrow use case.

Justin Ryburn • 31:34

We’re not talking about Agentic here. We’re talking about automation and a deterministic outcome. But, you know, it took a level of comfort to get to that point. I think the same thing will happen with Agentic. It’s going to take some iteration, some maturity, some getting to a level of comfort before we just say, yeah, let it do its thing and nobody’s even aware of it. And we just check on it later.

Ethan Banks • 31:55

Some talks from the nanog stage about mistakes were made, and then we all learn from. And that’ll be part of the evolution, I bet. Yeah.

Chris Wade • 32:03

Yeah, just I would just add to it: like, AI is a tool, right? Like, we’ve had a lot of tools. I don’t think the goal has changed as much as we have a reasoning tool that we did not have before. And I think, you know, with certain networks, like self-organizing networks or whatever, like with a. Smaller domain with a simpler deployment model, we’ve achieved quite a bit of autonomy in different segments for certain use cases, right? But they typically have those attributes of simplicity or certain attributes that allowed us to do it with the technology that we had at the time. We’ve talked about intent-based frameworks.

Chris Wade • 32:36

We’ve tried all sorts of things over time, which have had different levels of success. But what you were just saying is with reasoning, does that get us over some of the historical restrictions because of brownfield environments or non-standard deployment models, et cetera? Can the reasoning overcome that so that we don’t have to have the constraints that we’ve had to have in the past of standard model, standard deployment, standard vendors, standard services, et cetera, to achieve such an outcome?

Ethan Banks • 33:05

I got to say, there are people watching this right now who want to punch all four of us in the throat because we’re using the word thinking and reasoning to apply to AI. So can we qualify what we actually mean by thinking and reasoning? Because it isn’t what we do as humans. What do you get? Thoughts?

Chris Wade • 33:22

Sure. Yeah, I guess. So many of the times we’re replacing what used to be hard-coded scripts into a prompted interface, right? So we’re not saying, you know, go drive the car down the street. We’re giving it a very detailed prompt of what we’re asking it to achieve, right? And we’re giving it what we think are the appropriate tools and information to make such decisions. To achieve this outcome, you have to overcome certain differences in data or differences in logic at runtime, right?

Chris Wade • 33:54

So all of us that write pipelines all day and write infrastructure as code pipelines, we all know that the minute you add a difference in the network or a different error condition or a different payload comes back from an API call, you have technical debt within that infrastructure as code pipeline to go tweak it because it has to handle all those because it’s 100% deterministic, right? So the question is, can I back off a little bit of that and just reason through maybe manipulating the data models and stuff? So we’re not talking about thinking in the sense of, as you maybe were saying, but I’m taking the prompted interface, which is the guidance from the human. With some of that logic to overcome some of the stuff we couldn’t overcome before, because I had to rigidly define it in a deterministic way. So, I think if we just talk about incrementalism and talk about making more advancements towards our automation journey, we can replace some of that rigidity we’ve had in the place with a little bit of reasoning, which I would not correlate to thinking. Like exactly your point, but maybe we can handle three varieties of error messages without having to code three exact error conditions in my pipeline.

Justin Ryburn • 34:59

So, you know, I understand the skepticism of the terms thinking and reasoning, but I think as an industry, We’ve adopted those because it makes it easier to describe what it’s doing, right? It’s easier to correlate to how a human being would think or reason through a problem. To Chris’s point, a lot of these, you can actually see the detail of the logic that it’s taking. And a lot of these prompts will actually show their homework, if you will. Instead of just saying this is the answer to the math problem, they’ll actually show their homework. And so you can actually see the reasoning, use that term somewhat loosely here, how they got there, right?

Justin Ryburn • 35:33

So you can see this is the steps that it was taking. This is the logic that it applied to the problem to try and get to the outcome. Is that thinking and reasoning? Probably not in the way we think about it and the way a human brain works, but I think it’s a close enough analogy that it helps people who don’t want to or need to fully understand how all of the AI engines and LLMs and all that kind of stuff work. It’s a good analogy to help them understand it.

Ethan Banks • 35:56

Okay, fair enough. New topic, that of speaking the language of the business to the business. We have said, and I’ve heard it said from the stage multiple times now. Referred to Jeff Gray’s talk that was at AutoCON four, I believe, or five. I’ve lost track. It was a four. Okay.

Ethan Banks • 36:18

We’re saying, hey, network engineer, you need to be able to demonstrate with some accounting style math and spreadsheets a return on investment by coming up with some sort of a metric that you can then demonstrate to the business. If we buy this or change this process or implement this system, we get this win. Not every network engineer is going to do that. It’s just not in their parlance. It’s not the right persona. Have you guys seen dealing with customers who a typical persona is that’s good at this? That can play, have their feet in both realms.

Ethan Banks • 36:51

They’re an engineer. They understand deeply what’s going on in the network side, and they can speak business and understand what business stakeholders need and what the C-suite cares about.

Justin Ryburn • 37:02

Yeah, I mean, what I see when I work with customers is to your point, they’re usually two different people, right? There’s the individual contributor, the engineer who understands they need, you know, Netbox, they need Kintik, they need it tension, they understand the value prop, they understand how they’re going to use these products. They’re usually not the same person who’s going to the CFO and building the business case and explaining why they need these products, why they’re going to spend money for these commercial products, and what the ROI is they’re going to get out of those commercial products. There’s usually someone in the organization between that engineer and the CFO who’s building that business case. It’s usually a leadership skill that someone, and an engineer, can do this. Like to your point, like a lot of times, the person doing that needs to have enough of a grasp of the technology to be able to understand it, understand how it works, but enough business knowledge and finance knowledge to be able to understand how to build the business case and speak about the outcomes and the business value that the product’s going to bring. So it’s usually someone in a leadership, like manager, director, VP, somewhere in that level that’s taking the translation of the technology and putting it in a way that the business can understand.

Bill Lapcevic • 38:10

Yeah, I very much agree with that. It doesn’t necessarily have a specific title or a specific tier of the organization. It can be a manager of network automation. It can be a VP of infrastructure. I think it comes down to where is the intersection between execution and the pain that the business is trying to solve. There’s some layer in there where on the panel, I talked about speed, or you talk about how many manual changes are causing risk or causing outages. At some level, there’s the execution side of that.

Bill Lapcevic • 38:45

Okay, we don’t want as many manual changes. That’s great. The next layer up is feeling the pain about those manual changes because somebody from the top down is coming in and saying, This business process is no longer working, or you know, I don’t want to be on the cover of Newsweek, e.g. , because somebody fat-fingered a router. There’s some person that viscerally feels the pain of the executive, but also has to work with the people who are executing against the task. And wherever that is in your organization, I think that’s the person who best can translate. And go and secure the funds for the network automation project.

Ethan Banks • 39:27

Yeah, I can give an example of an organization I worked where my manager and that person’s manager both were skilled at not only the engineering needs of the IT organization, that’s where they came from as they moved up the ranks over the course of time. So they deeply understood what was going on in the data center. They got it. But they also had been brought in by the business to be aware of significant business projects. And these are things we’re looking for IT to support us on. And these are the initiatives that matter to us for this quarter. And they would be the go-between between those of us that were in the trenches and working deep in Iraq and the business that was working on an acquisition right now.

Ethan Banks • 40:10

And they were procuring cash and setting up the deal and working on timing and prioritizing that appropriately at a business level when they couldn’t talk about it publicly yet, but yet the IT needed to be engaged. There were those two guys that were in the middle that were really skilled at being able to prioritize those things and explain all of this sort of stuff. And so, like you said, there may not be any specific person. It’s someone in leadership, someone who’s got a leadership capability and a role and is willing to advocate for both sides and be an abstraction layer between the tech and the business and the business and tech, whoever that person or people may be. Yeah, you’re really fortunate when that’s the CTO, e.g. .

Bill Lapcevic • 40:54

Yes. Right. Some organizations have that and they’re at the forefront of network automation often.

Ethan Banks • 41:00

Yeah. That is a great person to have. And it isn’t, again, it isn’t necessarily one role that is the CTO. It’s not like, oh, it’s always got to be the CTO, but it’s got to be somebody. And I don’t, I guess my point here is I don’t think it has to be everybody. As much as we can tell the network engineers, guys learn to speak business. And yeah, that is important, but not all of them are going to be particularly skilled at it.

Ethan Banks • 41:25

But you got to find someone in my mind. It’s my take is you got to find a, I think of it as a liaison role that would be really, really helpful to have that person within your organization.

Chris Wade • 41:37

All right, guys. If I get at one point, real quick. A lot of people have experienced like insourcing, outsourcing, some activities at some point. So I think from an automation activity perspective, I think we’re fairly comfortable a lot of times having those discussions, like firewall rule change, load balance, or provisioning a network service. But just one additional dynamic I think that’s interesting is that, you know, a lot of our line of business teams have had shadow IT for a while. A lot of them have been doing cloud for a while. And I think a lot of times we feel like within the networking teams, we have to position this stuff.

Chris Wade • 42:06

A lot of times, if you go ask your application teams, your line of business teams, they can almost help you understand why you operating in a certain way is going to help achieve business outcomes. Because ultimately, IT a lot of times is viewed as cost center. I think the biggest value is supporting, is having our IT infrastructure support the business directly. And a lot of those line of business teams, if you can do same-day service, if you can be more flexible in your delivery model, they’re going to help explain to you and help you tie some of the business because you might work at a retail shop and obviously the more stuff we sell, the better. But the product owner for the retail shop might be able to explain to us why having SD-WAN at the branch or having same-day service or provisioning applications more rapidly or securing it is going to be a huge advantage. Because I think a lot of times we’re focused a lot on activity and direct automation. The true business value, I mean, the network infrastructure that we support and evolve supports the business.

Chris Wade • 43:01

And I think the business is tighter to the infrastructure than it’s ever been. So reaching across the aisle can really help us understand why we do what we do versus I got 100 tickets and I have an SLA. That’s a very important aspect, but there’s a reason why that SLA is in place because there’s a business outcome associated with it.

Ethan Banks • 43:19

All right, guys, one more topic. Data telemetry. We talked towards the end of our stage presentation about. Metrics and how we determine that there is a success that’s going on. And we got into the concept of a data lake as a repository of information that’s got all of our telemetry. And then applying AI to the data lake to pull some kind of meaning out of it. Let’s flush that out a bit.

Ethan Banks • 43:45

Let’s see what that means. So let’s start with telemetry that every network engineer is familiar with. SNMP data, flow data. What other kind of telemetry are we talking about in 2026?

Justin Ryburn • 43:56

Usually syslogs. We are starting to see people actually really doing streaming telemetry for realsies. Okay. I kind of equate it to IPv4 to IPv6 migration. Same thing with SNMP and streaming telemetry. SNMP, as much as I personally would love to see it, die tomorrow, I’m sure it’s going to be around for a long period of time because people are going to have network-attached devices like printers or whatever that are probably never going to speak streaming telemetry. And they’re not going to get replaced just because they don’t speak streaming telemetry.

Justin Ryburn • 44:27

So anyway, all those are data sets that you need to collect to have a good understanding of what’s going on in your network. The other one that I find customers don’t think about, but is important, is some sort of synthetic testing across the paths, right?

Ethan Banks • 44:42

Performance across the path. Well, I think of that as a, I think of that as like more APM, what we used to call APM back in the day, application performance monitoring where I’m doing transactions up at that level. What are you getting at?

Justin Ryburn • 44:53

I always think, I mean, that is a higher layer than what I was thinking. I was thinking more at the network layer, like pings, trace routes, that kind of stuff. You know, because you can have performance issue on a path that won’t show up in syslog, won’t show up in metrics, won’t show up in flow, and one of those other telemetry sources that we were just talking about, but will still have a very real impact on the performance. Yeah, exactly. Okay. Yeah.

Ethan Banks • 45:17

All right. Okay. So that is one classification of data. We’re bringing in legacy, if you will, telemetry that we’ve had for forever. We’re adding to that mix streaming telemetry now and then augmenting with yet other kinds of network data. But I want to get you guys’ take on this because in my mind, what is really intriguing at this particular moment is why would we be focused solely on the network when what is usually more interesting is the entire IT stack? There’s an application that we care about that’s running on top of that network.

Ethan Banks • 45:48

How do we correlate the telemetry that might be coming out of that application with all of our network telemetry and then say, this is what’s happening? You know, more of that user experience kind of measurements, whether that’s digital experience monitoring or whatever the acronym is you want to use, or troubleshooting some kind of a failure or a gray outage. It’s not completely down. It’s not completely up. It just sucks at the moment. Why? What is going on?

Ethan Banks • 46:13

Well, we all know there’s a million reasons why, because there’s a whole lot of stuff in that stack. The network’s just one layer of it. Are we at the place now where with AI being applied to a massive data set, a data lake full of information? I’m skipping the data normalization problem for the moment, but we can get back to that. But are we at a point where this is what we should be as businesses doing? Network telemetry plus all the rest of the IT stack telemetry coming into one big lake, applying AI to the problem and seeing in real time, hopefully, what’s actually going on and getting to the root of root cause of problems very quickly.

Justin Ryburn • 46:52

100%. Yeah. I mean, that’s what I’m hearing from customers, whether they’re doing it in Kintic or they’re building their own data lake that they’re taking data out of Kintic and Netbox and wherever else and putting it into a system and building their own agent on top of it. One way or another, that’s how they’re trying to solve this problem, right? Is have a data lake that the AI has access to and can correlate. Again, we talked about this a little earlier, but the agent can then correlate across all of those data sets much faster than a human being could dig through and say, okay, wait, I got a syslog over here and I’ve got a trap over here and I’ve got a config change that took place at the same time, right? That’s one thing the AI agents are really good at is correlating all of those time windows and what happened and what the root cause is.

Bill Lapcevic • 47:37

It’s really interesting that we’re talking about AI in the network now. Because back in the day when I was at New Relic or at Wiley Technology, this was always the thing. You could use an APM tool, as you were describing, to say your application is slow and probably there’s a code reason. But if it wasn’t a code reason, then what happened was he’s just said, oh, it’s the network, always the network. And there was a wall. Nobody could see past that wall. And then you just say the network is slow.

Bill Lapcevic • 48:07

And it’s like hollering off the top of a mountain and hearing your voice echoing for a long time. And now what we’re able to do if as this the promise of AI Kind of starts to start to materialize, like you said, is you could actually go from my app is slow. It doesn’t appear to be the code. Maybe AI can correlate between the network traffic, the flows, the configurations, the changes that have recently been made across a vast number of devices and systems in a vast number of locations. And because it can process so much data, it can find patterns that it would take humans a lot longer to find. Maybe that’s the promise of adding AI to the network.

Ethan Banks • 48:51

Well, but not only that, but meantime, the innocence for the network, if the problem is in fact a slow database that’s got, that’s struggling and it’s adding latency to transactions, those kind of things.

Bill Lapcevic • 49:03

And the business doesn’t care where the problem is. They just want the problem solved.

Chris Wade • 49:06

Right. Yeah. I will say we kind of designed it this way, though, right? Like the. The network and the applications were built independently from each other. So, you know, this idea that there was a decoupling between the application layer and the network layer was on purpose. So now we’re talking about correlating across data patterns that maybe aren’t the same, right?

Chris Wade • 49:29

I would argue that if we put the operational data, because how we do this from a software perspective is we have trace IDs and we have traceability from the requester on the web server all the way down to the log file. So if we talk about using automation to provision these things, if the requester is from the application, I know exactly what infrastructure got provisioned based upon that request. So we have performance data at different layers. You could have transport, IP, all the way up to the application layer. If you bolt some of the operational data on the side, you kind of get traceability through the path. So I’m not saying every single thing, but understanding that this additional storage was attached to this data lake for this application and that request came through a pipeline or some sort of other API, I can kind of give you that tie-together data, which either AI or some other system is going to be able to use to correlate that stuff. So it’s about mashing multiple data streams together, I think.

Ethan Banks • 50:24

We have interesting telemetry coming right out of code if we want. So we’ve got open telemetry now and a bunch of signaling that comes from that. We get distributed tracing that can be a method that we could maybe tap into with all the data. Do you guys happen to have seen customers getting that specific where those different sets of telemetry coming from pretty different knowledge domains or perspectives are able to be mashed up by an AI and find root causes? This is me hoping this is my golden path that I’ve been wanting to see happen.

Justin Ryburn • 50:54

I don’t. I see it talked about. I wouldn’t say I’m seeing customers that are actually doing it. I think, you know, part of it goes back to Chris’s point where, as an industry, we’ve siloed the networking team and the app teams intentionally. Yes. Right. And so the data doesn’t get shared.

Ethan Banks • 51:11

But a developer needs to know that the network is more than just they opened a socket and that’s it. Then magic happens. We need to flush that out.

Bill Lapcevic • 51:18

It was the same for a long time with app developers and operations people. And then the DevOps movement happened. Yeah. Right. And then they found bridges across those two divides and started working more closely together. And the application world took off, right? We’re just at the start with that same thing starting to happen with networks and applications and other parts of the business.

Bill Lapcevic • 51:40

And maybe AI bridges that I think it will.

Justin Ryburn • 51:43

And like, you know, we talked a little bit about this, but maybe to double-click on it, you know, if the way that an application person has to interrogate their network telemetry data is. Doing a SQL query or even navigating in a UI like Kintik, while we put a lot of time and thought into how we organize the data in our UI, an app person’s probably not going to learn how to navigate Kintik’s UI, right? They might, however, take the time to ask a natural language question. Add a prompt and say, why is my application in US East 1 slow? And the AI can then go and look through all of the telemetry data and come back with a reasonable answer. Or taking it a step further, use A to A or MCP to pull that directly into a new relic or whatever APM that the application teams are working in. So when they’re having a problem with an application slowness, they can interrogate the network data without having to log into the network teams tool.

Justin Ryburn • 52:42

Right. I think, so anyway, long ramble, but where I’m going with that is I think AI maybe some help promise break down these silos that have existed for so long.

Ethan Banks • 52:52

Well, guys, this is a good place for us to wrap. Great discussion on stage. Thank you for making the time for the follow-up. Very much appreciated that we could flesh out some of this stuff. There’s a frustration I go through when we’re time limited on a stage and we can’t get into the nuts and bolts of how this is. And we have to spend a little more time and flesh out some of these things in a more concrete way that makes it feel more believable. So very much appreciated.

Ethan Banks • 53:15

And again, the talk, if you’re watching this that we’re referring to, happened at NANDOG 96. It was titled The Network Automation Isn’t Just About Network Configuration or About Configuring Devices. So if you watch that talk and then watch this, a lot of the context will make more sense. And thank you for watching. We appreciate your time.

How Itential, Kentik & Netbox Labs Work Together

This panel reflects something broader than a single conversation. Itential, Kentik, and Netbox Labs are ecosystem partners – complementary platforms that show up together in real customer environments because the problems they solve are adjacent and interconnected.

Netbox Labs provides the network source of truth: the data model, the inventory, the “what is vs. what should be” foundation that automation depends on.

Kentik provides the network intelligence layer: visibility, telemetry, and the analytical context that tells you how a service is actually performing.

Itential provides the platform for a governed, repeatable way to orchestrate service delivery across multi-vendor, hybrid, and brownfield environments – integrating with ServiceNow, ITSM, and CI/CD workflows, operationalizing agentic AI responsibly, and producing the audit trails and reporting that connects network operations to business outcomes.

When these platforms work together, teams can go from knowing what they have (Netbox) to understanding how it’s performing (Kentik) to automating what needs to happen next (Itential) – in a loop that keeps intent and reality in sync.

Filter

Sort By

Itential Platform

Solutions

Resources

Partners

About Us

From Device Configuration to Agentic Operations:
The Next Step in Network Automation

At NANOG96, Itential joined Packet Pushers, Kentik, and NetBox Labs to explore how network automation is evolving – from configuring individual devices to orchestrating autonomous, AI-driven operations at scale.

Watch the Full NANOG96 Discussion

Episode Notes

View Transcript

What the Panel Explored

If you can’t log into every device and manually configure them, you’ve got to automate. But it’s not just about point scripts – it’s about thinking holistically about the system and what you’re trying to accomplish.

Every change is important. Measuring change, change effectiveness, and time to value are the critical components.

From Device Config to Service Delivery: A Mind Shift

Agentic AI: Adopt It Early, Not as the Finale

The most critical services, I would still argue, we’re going to do how we do today – full guardrails, full modeling, full execution. But when we start thinking about scaling out operations, we can rethink a few things.

Take a Deeper Dive into the Discussion

Episode Notes

View Transcript

How Itential, Kentik & Netbox Labs Work Together

Stay in the loop with Itential.

Filter

Sort By

From Device Configuration to Agentic Operations: The Next Step in Network Automation

At NANOG96, Itential joined Packet Pushers, Kentik, and NetBox Labs to explore how network automation is evolving – from configuring individual devices to orchestrating autonomous, AI-driven operations at scale.

Watch the Full NANOG96 Discussion

Episode Notes

View Transcript

What the Panel Explored

If you can’t log into every device and manually configure them, you’ve got to automate. But it’s not just about point scripts – it’s about thinking holistically about the system and what you’re trying to accomplish.

Every change is important. Measuring change, change effectiveness, and time to value are the critical components.

From Device Config to Service Delivery: A Mind Shift

Agentic AI: Adopt It Early, Not as the Finale

The most critical services, I would still argue, we’re going to do how we do today – full guardrails, full modeling, full execution. But when we start thinking about scaling out operations, we can rethink a few things.

Take a Deeper Dive into the Discussion

Episode Notes

View Transcript

How Itential, Kentik & Netbox Labs Work Together

Related Content

page

Itential x Packet Pushers Resource Hub

Blog

Reimagining Infrastructure in the Agentic AI Era

page

Integrating NetBox as a Source of Truth with Itential for Network Automation & Orchestration

From Device Configuration to Agentic Operations:
The Next Step in Network Automation