How to Leverage Itential MCP & Agentic AI to Reduce Network & Security Troubleshooting Time

Reduce Network Troubleshooting Time with Itential MCP & AI

When outages strike or misconfigurations slip through, the clock starts ticking. Every minute spent hunting down root causes means lost productivity, frustrated users, and – in some cases – millions in revenue impact. What if AI could help you cut that time dramatically, without adding risk?

In this demo, Rich Martin and Ankit Bhansali show you how to pair Itential Platform + MCP server with your LLM of choice (Claude, ChatGPT, etc.) to deliver intelligent, safe troubleshooting, at a fraction of the time. You’ll see how AI can not only resolve common issues faster but also help engineers build and test new troubleshooting tools that continuously strengthen your operations.

???? In this demo, you’ll learn:

Integration in Action: Connect MCP with an LLM for context-aware troubleshooting across your ecosystem.
Prompting for Ops: Best practices for structuring prompts that drive accurate, focused troubleshooting outcomes.
MCP as the AI Enabler: Harness Itential’s inventory and orchestration to accelerate issue resolution.
AI as a Builder: Using AI to generate, test, and refine new Itential troubleshooting tools.

Demo Notes

(So you can skip ahead, if you want.)

00:00 Introduction
02:38 AI Prompting Strategy
07:16 Lab Topology Demo
09:25 Itential LCM & Network Services
14:54 Configuration Manager Capabilities
17:06 MCP Server & AI Agent Setup
19:29 AI Troubleshooting Session: Customer Outage
31:17 ServiceNow Ticket Updates
38:38 Configuration Template & Application
45:53 Wrap Up & Documentation
View Transcript

Rich Martin • 00:06

Hello, everyone. Welcome and thanks for joining in. My name is Rich Martin, Director of Technical Marketing. I’m joined by Ankit Bansali, Principal Solutions Architect. And today we are covering a really, really fascinating and I think a super interesting session, how to leverage Itentril platform plus the Itential MCP server and your AI LLM of choice for, here’s what we’re going for, network and security troubleshooting. I think this is a super interesting one because where we’re at in the industry right now is, is it safe to give every device on your network to an LLM, secure, direct, secure SSH access to CLIs and APIs? Probably not, right?

Rich Martin • 00:51

Probably we’re not there yet. But it’s important to start leveraging AI in ways that are both safe and effective, that actually have measurable results for a company. And I think troubleshooting is one of those big things. As a setup to this, especially the customers that you talk to or the prospects that you talk to, large enterprises, service providers, government organizations with big, huge networks, they all have this same general problem. They need to troubleshoot broken things. And from the end user perspective, there’s only one thing. It’s broken.

Rich Martin • 01:26

Whatever it is that you’re providing for me, it doesn’t work. But what’s the reality of that troubleshooting process? Looking at that network as a whole, you’re typically in a multi-domain, multi-vendor situation. The teams are especially siloed, not just individually with the teams, but also the tools that they use are siloed. And therefore, orchestration is really done through trouble ticketing, right? Which is to say it’s manual orchestration through a trouble ticketing system. And look, I’m going to speak from the perspective of a network engineer.

Rich Martin • 02:00

Everybody passes the buck or kicks the can down the road and blames the network.

Ankit Bansali • 02:06

The traditional phone number.

Rich Martin • 02:07

That’s the traditional flow. And then from there, the network decides it’s not really our problem and it’s somebody else that kicked the can down the road to us. So what does that really result to at the end of the day? Even the simplest of issues are taking way too long to resolve. And in the case of all of those prospects and customers that we have, their network is of vital importance to them, right? In some cases, they can calculate how much loss of revenue for every minute of outage they have. And we’re talking millions of dollars.

Rich Martin • 02:38

So this is why this is such an important and I think very interesting way of approaching how can we start to use AI and leverage that with our complex network infrastructure. Now we’ve covered several of these kind of sessions together on Kit and the prompting strategy is kind of similar, right? Always use relevant details. That’s important. Like the more description, the less back and forth. And it’s just like working with you and I are two humans, right? The more information we can share with one another initially, the better the response is going to be.

Rich Martin • 03:14

Making suggestions on what to check. And the thing I want to iterate here too is that we’re hitting a point where one of the big use cases for AI was vibe coding, right? But vibe coding doesn’t sound very professional, does it? It’s not very business-like. So now there’s a new terminology for it, right? Pair development or pair troubleshooting. And I think that’s actually a better way of thinking of it.

Rich Martin • 03:36

We’re not telling the AI to do everything and it’s on complete autopilot. It needs the human element to help structure, troubleshoot, and iterate. And so, I call this: I’m going to go ahead and turn the coin coin the term with you: pair troubleshooting, right? Letting AI do the things that are super effective at what it does, but then being able to steer it in the right direction and getting the results we want is still vitally important. And you’re going to show us what that looks like today. And the key here is: again, you don’t give the keys to the entire network.

Rich Martin • 04:11

As a whole, to the AI. That’s not very responsible. So, leveraging MCP with the Itential Server is a big part of this. It gives you access to tools, workflows, different parts of our platform that we’ve already engineered alongside our customers who are building workflows and exposing those so that they can be used in a safe and effective manner. So, whether it’s operational data, whether it’s configuration data from things we’ve stored in Configuration Manager, all of those things are available, all of those tool sets are available through workflows and different things through the RMCP server. And the cool thing about this is too: this allows us to make closed-loop improvements. Not only can we effectively troubleshoot faster with new tools, but you can build new tools for faster resolution.

Rich Martin • 05:02

You can document root causes and feed that back into the AI so you have your own little local AI of things that have gone wrong in the past, right? So, now we can iterate and make this even faster and faster and better and better. And again, this is key because lots of customers, we won’t talk about outages, but they happen, right? Even in the biggest of networks, but they happen, and so, and they happen for a number of reasons. So, this is why this is really important. So, with that, I know we got a lot of stuff to cover. I’m super excited about what you’ve built and what you’re going to show us on Kit.

Rich Martin • 05:35

So, I am going to stop sharing here and give it to you.

Ankit Bansali • 05:40

So Rich, thank you for breaking down and like I was I was about to tell you like the words you spoke, especially the first four words in your sentence AI, network, security and troubleshooting that is a whole lot of craziness that goes around and to be honest people need to understand and let me tell you like I was on a call this morning and we also had a session with the customer that was on site. Troubleshooting is important and they are asking how can they leverage AI successfully because everybody in every organization, especially also in the infrastructure and the networking side, they are being asked how can we leverage AI to make things efficient, to optimize the current processes. And they told me that they gave me a really positive feedback that the way Rich you are helping us in terms of educating, in terms of helping people understand the art of possible and in terms of all the videos you are doing is super well received and the industry really wants to see where we can take this. So, I think once we were brainstorming, right, we thought about what are the most common services people understand. And I mean, we understand L2 VPN is very common, L3 VPN is very common, E-Line, E-LAN, EVPNs, right? So, these are very general layer 2, layer 3 technologies which people leverage. So, what I did is I built a lab around that concept.

Ankit Bansali • 07:16

And I’ll help you break down first the topology. I’m going to talk about the customer and the network services I have deployed on that topology. And how does Itential come into this picture, right? Because everybody has infrastructure and we want to show our Itential customers and future Itential customers on how they can seamlessly integrate all of the AI technologies with the networking side of the world. So going into the network topology, basically this is a very standard two spine, two leaf concept and with two customer equipments at the end. So we have tried to build a EVPN service with Arista switches. And I’ve leveraged Container Lab.

Ankit Bansali • 08:03

So shout out to Container Lab for making things very easy in terms of deploying, getting the images and setting up these kind of use cases for lab testing. Customer services. This is very critical. So what I’ve done is I’ve also assigned a bunch of current running services to customers. So there’s a customer A, there’s a customer B and a customer C. So each one represents their own services they are running. And from that perspective, we get a good understanding of the services which are currently for these kind of customers.

Ankit Bansali • 08:38

And from an infrastructure standpoint, like I was saying, this is the configuration side of the world where we have spines, leaves, and customer equipment. But this is just an overview and topology from my current network design. Do you have any thoughts on this before I jump into the itinual platform and tell you guys how this actually get consumed into the platform?

Rich Martin • 09:00

No, this is great. The only thing I would add here is like this is a perfect use case. Every data center that we engage with with customers and prospects is most likely going to have some sort of VXLAN, VLAN implementation. It’s the standard today. So whether that’s a service provider, whether that’s a large enterprise, that’s the technology at heart. And so this is such a relevant thing for everybody, really, that we would interact with.

Ankit Bansali • 09:25

Yeah, and this time we have actually thought about network services as a whole product, and that’s why I think it’s very critical to show people more value out of the platform and the use cases with AI. So, thank you for that, Rich. So, Rich, something which we have talked a lot is how Itential’s lifecycle manager is very helpful in these kinds of scenarios. So, if I search for my EVPNL2 VPN services, you can see those customers are also available inside of the LCM. And this is where things get really interesting because Itential knows how to provision services in terms of capturing all the network services information. We can now leverage this kind of information with AI to teach AI about the customer and customers who currently have issues during troubleshooting sessions, their first main complaint, and I’ve been in this couple of meetings already, like I think last past month. The issue was data.

Ankit Bansali • 10:34

I do not have complete information about the customers. That data is stored in five to six systems of records, and nothing is tied to network services as a concept. And even if it’s tied, it is stored in distributed sources of record. So, what I think I like is having the ability to have that network service inside of lifecycle management, especially with its ability to have structured data, which is mapped to attributes you would want to capture, which is my customer information, my customer devices, VLANs associated to the customers, the neighbors, the BGP routes associated to the customer, the route distinguisher. All of those network services or network-facing services attributes can be stored and can be really used well. And I’ll show that during our use case of troubleshooting one of the outages for one of the customers. But this I found very critical in troubleshooting.

Ankit Bansali • 11:33

And also, a lot of customers complain about not having this data, especially the level 1, level 2 engineers in the NOC.

Rich Martin • 11:40

Right, I think this is an excellent point. LCM gives people the ability to create a service, tie all of the details of the service across many different network domains, the specific details that are resources for that particular service, and even things outside of the network, right? Things like trouble ticket numbers for the initial provisioning, all of those things can be tracked in a single place. And when we talk about AI and making things more efficient with AI, If I can now have all of that in a single API call that describes that service or what that service ought to be versus having to tie in five or six different sources of truth to maintain that, and to get that, that makes things more efficient. It’s not that you can or you shouldn’t do it, but what is more efficient? And in some cases, this might be the thing you need for these types of services.

Ankit Bansali • 12:36

Totally, and that’s the feedback I got with customers whenever we were presenting this, right? Their engineer was like, if we can consolidate, because we have distributed sources of truth, can we have a network services built inside of this that can track all the attributes that we would love to not duplicate, you can still reference from other sources of truth, right? So you’re not asking people to bring everything they have in their other databases into here, but have meaningful information because now to your point, this becomes the context for AI in terms of understanding more about the customer and the network services which you are offering. So it really helps AI to understand faster. Awesome. Yeah, and that being said, this is also our way of telling customers that we are not just trying to have the inventory just for network devices. Because we do infrastructure, right?

Ankit Bansali • 13:35

If you look at my topology, I have two customer equipment. I have access to Spine and Leaves, which means I can pretty much have access to any type of operating system, whether it’s network-associated devices or infrastructure-associated devices like Linux boxes, Windows machine, where one of my CE1, which you saw, is nothing more than a Linux, Alpine Linux, right? And you can see that you have the ability to look at configuration, run commands on the box without sharing any critical, any security-related information to the engineers, which means they can run commands, only the approved commands which you prefer them to run against these boxes. And at the same time, they also have access to take advantage of backups, right? And in this case, I’m going to show the one of the Spine configurations. So from an inventory perspective, you can manage everything inside of Itential from network services to the inventory that actually those network services run on, which is the infrastructure here. Yeah, so this gives a lot of value in terms of having that single application that does your backup golden configurations and also have the ability to run commands against those boxes.

Ankit Bansali • 14:54

So that was my topology, which is completely available inside of Config Manager. So I have full access to run commands I would like to run against these boxes. Which means now when I expose these kinds of services to an agent, I am much more relaxed because I’m not offering any SSH CLI support natively. I’m not offering the agent the direct access to all these devices and sharing my secrets of all these passwords around. And which is again something which customers really like is they have to ask, I mean, the security team always asks this: bringing in AI, how does this actually touch the network? They cannot have devices being run directly from an AI perspective because it’s too much of a liability, because you need guardrails to actually guide the tool in the right direction. And this is where I think what we are trying to show people is that the next

Ankit Bansali • 15:53

You know, step which you can really trust because you’ll have identical in the middle and you leverage all the best practices around how to interact with these kinds of systems.

Rich Martin • 16:05

Yeah, yeah, exactly. And I think we said it earlier, too: is that within the platform, there are multiple levels of guardrails that are requirements nowadays, not only for human interaction with the network, but also AI interaction with the network, right? So you can think of an AI basically as just a much faster human. So we can’t get rid of things that are already necessary, like RBAC and auditing and guardrails within the tools that we expose through a workflow, things like that. Those don’t go away with AI, they actually become more important.

Ankit Bansali • 16:38

Correct. Very very critical for how you expose and consume things. So now let’s move into the exciting process. And I just got a notification on my Slack. There’s a ticket that’s just opened. Okay. So let’s go take a look at the ticket and let’s do some so Rich, what we are going to do is I have I have told Claude here, which is my desktop agent, to actually help me troubleshoot the network.

Ankit Bansali • 17:06

And I have integrated this agent with Bunch of itential services with our itential MCP server. The good thing about the itential MCP server is you get full control and you can actually do dynamic binding, which means you can expose dedicated services for your dedicated agents, which means this makes it very easy and clean and efficient way to expose tools to your agent. Definitely. And in this case, think about this as my troubleshooting agent.

Rich Martin • 17:39

Okay. So you’re building a specific agent that only has tools to help aid in troubleshooting specific.

Ankit Bansali • 17:46

Yes, and that’s what our customers can do. That’s where when I was having this conversation earlier with the customer, there’s two ways to do this. You can think about builder personas where we can help you build things. You can think about troubleshooting personas. You can think about AI ops personas. And it’s up to you on how you define that persona. And then you can leverage the MCP, which can be dedicated or shared resource across multiple agents, depending on how you deploy the MCP server.

Rich Martin • 18:17

That’s perfect. That’s exactly in alignment with everything else we’ve been doing in the platform, right? Being able to build things that are specific and limit what capabilities are accessible to whether it’s a human or an AI. In this case, building a custom agent that can only have tools that are accessible for this particular task.

Ankit Bansali • 18:36

Absolutely, yep. And it’s very secure and and very um you know um well well worse with with the governance model of of deploying uh MCP with with agents. Cool. So guess what? Uh while we were talking, we got our first ticket and there’s been an outage that’s reported uh for one of our customers and that’s uh operations on and and having issues on on VLAN two hundred ten. So let’s see how our agent does this. And the way we’re going to do this is going to be very organic.

Ankit Bansali • 19:12

And this is the way where we want people to get into the idea and the mindset of working with AI and working with AI in a very governed and secure way where you can do responsible things and everything can be tracked and audited.

Rich Martin • 19:28

Correct. So got it. Yep.

Ankit Bansali • 19:29

Yep. So let’s feed in this. Okay.

Rich Martin • 19:33

So our end user has reported an outage, given us some details, or it could have come from the end user to our NOC or first level of support with these details.

Ankit Bansali • 19:45

Correct. And since what I can do is I can understand what it’s doing and I already know I have a network service that’s affecting this. So what I’ll do is I’ll first ask it to: can you please check network services, which is EVPN, L2VPN service? For customer B and get some details first. So, this is where I was saying how LCM helps you build that context for that AI. So, you’re not just blanketly going out and trying to, you know, boil the ocean. You’re trying to make it very easy for us to leverage this.

Ankit Bansali • 20:38

Excellent. The authentication is working, and we can see that it was able to identify the LCM network service, which was the L2VPN service. It identified the customer B information, and now it has complete context with what the customer is working with, what are the network configurations, what are the endpoints. So, if you can look, it understands the site A and site B context, right? So, there’s already a context that’s built in this process. So, now is the time to include much more information about the ticket. So, when it comes to that, let’s go into service now and we’ll copy this information.

Ankit Bansali • 21:22

This is what is being reported. How can we troubleshoot this better to fix the problem? First, give me some recommendations and then we will test the theory. So you’re making sure that you’re full control and you’re leveraging the good bits of AI and its expertise along with your understanding of the network and services. So if it’s looking at troubleshooting strategies, it identified three to four theories, right? So, what it’s telling you is that we can check for the VX LAN tunnel between Leaf 1 and Leaf 2. That could be down.

Ankit Bansali • 22:26

That’s the first thing. So, test NVE interface status, the VX LAN state, and check for reachability. Theory two, BGP could be a problem. Theory three, we could have local access port issues. Theory four, so now the thing is, AI can always give you a lot of suggestions. And this is where you need to understand how to filter out from that noise to an actual information that’s going to be helpful for you. And this is where we keep saying that AI is supposed to aid or enable our experts to work efficiently rather than handing over the keys to the kingdom, like you said earlier.

Ankit Bansali • 23:10

We are not suggesting that. We want to bring the art of possible in the best possible way, leveraging the best technologies we can at the moment.

Rich Martin • 23:20

Correct. Yeah, and this puts the engineer still in the operator’s seat. Making sure, like, you know, if you think of AI, we’re providing very safe tools for AI to use in coordination with these recommendations. So you can get them executed lightning fast, but let’s still have a human overseeing what’s going on.

Ankit Bansali • 23:42

Yeah, absolutely. And in this case, what we’re going to do is continue with the recommendation. So we still have more control, and we are selecting which theory to test against. So since theory one is most likely, that’s what the AI said. Maybe we show some faith and ask from its perspective, can you run a bunch of commands based on theory one and deduce some findings around it? So there’s two ways to do this. You can leverage itential workflows to go make calls for you.

Ankit Bansali • 24:17

You can leverage itential command template to actually run those commands. Or you could write a Python script of your choice. So there is different ways on how the building side of the team can actually help the operations team. So it’s again the flexibility of leveraging the right tool for the right job. And internally, if there are things, folks can leverage the best technologies the way they understand best with the skills they have. So we are not trying to force people into an idea of you should do something. We are the least opinionated solution architects here.

Ankit Bansali • 24:55

We want people to consume more AI technologies in the best possible way. And I keep repeating those words again. But that’s very important.

Rich Martin • 25:02

That’s super important. Yeah.

Ankit Bansali • 25:05

Yeah. So, I mean, you can see, I mean, there’s another way to check is just making sure if you took backups in Itential, maybe, you know, yesterday, you could go down that path where you just want to take a backup and do a diff, right? You don’t want to do a lot of troubleshooting.

Rich Martin • 25:21

Exactly. Has anything of substance been changed? That’s always the first question. Like, you get a call at four in the morning and somebody says something down. It’s like the next question out of my mouth is, what was changed? Right? So the tools can be in the hands of the AI for you to go, okay, what was changed?

Ankit Bansali • 25:39

And in this case, we’re going to make an assumption where the troubleshooting engineer does not have access to the backup. Maybe they do not have the right tools because with it, you can, but we are just making an assumption that we don’t have the backup in place, so we’ll try to troubleshoot. So, how about we continue this? And we ask AI, I also like theory one. What are the commands you planning to run against the boxes? Let me confirm first, right? And we can go as fast as possible, or we can take baby steps to get our confidence into this idea because it’s still very new, right?

Rich Martin • 26:36

Right.

Ankit Bansali • 26:42

So, if you can look, this is how it’s looking. It’s saying it would do some eight commands and then try to investigate those commands. So, how about we start with something related to the interface first? Okay. And then we check the VLAN ID and maybe then let AI do some additional investigation. I like those ideas. Leverage the itential run command service and do those checks and get back to me.

Ankit Bansali • 27:36

Based on the finding, right? So, we are not giving any engineers direct access to the box. We are not giving any AI direct access to the box. So, if you can look at this, right, it found there is already a service available for it, and it’s able to send the command and run it against the boxes. And you can see what commands it’s running. So, you still have full ownership in this case. So, it’s agentic but human-driven, right?

Ankit Bansali • 28:11

And eventually, we want to do the agent-to-agent context, but there is still time, and we need to first build confidence with baby steps in the process we currently have, especially for large organizations. So, if you can check the command syntax isn’t working, let me use the correct Arista VXLAN commands. So, it’s looking and it found that this is a command that works. Good news, it is checking for additional commands. The issue is detected. The VT, the VTAP addresses are wrong. Let me verify.

Ankit Bansali • 28:52

Because we had LCM and because we had the ability to actually cross-check what we had provisioned earlier, we can go back in time and understand what has changed. And with commands and network troubleshooting construct, we can pretty much identify what’s wrong here. So, look at all the checks it’s doing. It’s checking for BGP sessions. It’s checking for the route distinguisher, right? And the reachability. So, it’s also doing a ping test from source and destination.

Ankit Bansali • 29:29

So, it just verified that the underlay is working. And now it’s going to check customer-facing interfaces, which is what our customers actually complained on. So, giving it time, it’s going to do those commands, it’s going to process it, and it’s going to put its network troubleshooting hat, which is very cool to be honest, because this is no hands, right? Like, we are not driving anymore after understanding the right context. We were confident whatever AI was trying to understand, it was able to understand that. So, there you go.

Rich Martin • 30:05

And this may be the real breath of fresh air, the analysis reporting of what actually happened, aka the documentation that no network engineer wants to deal with after they’ve solved a problem.

Ankit Bansali • 30:19

And Rich, there is no civil chair. Now, here’s the kicker here. What I’m going to do is, I also have another service that will go and update the ticket which was raised. So I can take my findings, which I spend time right now, and put it back into the ticket without having to civil chair. So what I’ll do is I’ll go back, grab the ticket number. So I would say using one of the itential service now update ticket service, please add your findings there. This is the ticket number that needs updating.

Ankit Bansali • 31:17

So, again, not civil sharing. Imagine if my building team, the engineering team that has built a lot of services in Itential, I can pretty much take advantage of that right now without having to civil chair, leverage all the right information with all the stuff which AI did. And if I say allow, it’s supposed to kick off a workflow. And again, when it has trouble doing formatting, it will retry again with the right JSON formatting. So we’ll see how this goes. Perfect. The service now update workflow has been initiated.

Ankit Bansali • 31:52

Let me check the job status. You have two options. You can let it check the status or you can ignore it. For this case, I want to ignore it because I really want to take you guys into the platform. So I’ll go in and we will monitor and see how that goes. So operations manager, we’ll see all jobs. And there you go.

Ankit Bansali • 32:14

If you look, 12:40, that’s the current time. And we already ran the service which one of our builders had built for us. And guess what? We know it completed. We should see the data that came in here. And we should also see the notes section. So let me hit refresh here.

Ankit Bansali • 32:36

And let’s go there. Awesome. So, if you can look, we can predict this in an HTML format, make it colorful, right? But having this information directly from the AI’s investigation and identifying the root cause, and if you can look, this was investigated by Itential Network Troubleshooting Agent. You’re going to leverage it provides the next step, it’s telling you what’s wrong. So, you have pretty much captured whatever you can consider from the investigation you did with the agent. And everything is being tracked from an organization perspective without having to civil chair from a troubleshooting concept.

Ankit Bansali • 33:17

Right. And this is exposing anything from a security aspect.

Rich Martin • 33:22

100%. And this is a great first step, too. Like, so maybe this is going to the NOC team, and now we’re arming them with an AI that can safely use tools that they were not allowed to have before. Right. And now, if you think about it, this could be a step of, you know, this could be the step that now passes this to a network engineer. So maybe we’re not doing the fix to the troubleshoot. Maybe the NOCs, we’re only allowing them as a first step of safety to gather all of the details that normally they couldn’t gather.

Rich Martin • 33:53

So then it gets into the hands of somebody who needs to approve it and they go, oh, this is everything we need. And then eventually the next step could be, all right, let’s turn on the ability for a certain subset of changes that we can make because this could be happening a lot, right? And if the change is simple enough, we need to go back to, because think about what we just did. It didn’t just guess, it looked at operational data, it looked at configuration data, both historic from our config manager and what should have been, what ought to have been in the lifecycle manager app for that particular service, and said, aha, this is the data that it should be. It’s not that. This is the proposed change. So it’s not making a guess.

Rich Martin • 34:34

It’s actually using the data that is available to it. And of course, we could build workflows to even validate that data in more ways if we needed to for other systems if that were required. So this opens up a number of possibilities to go as quickly as possible. For your organization. But like I said at the very beginning, the troubleshooting piece of it, this still has a significant impact. Just as the engineer that gets called that says, hey, something’s broken, and now you say, not only is it broken, but here’s the fix. Go ahead and verify it and look at it.

Rich Martin • 35:06

And then can we make the change or do you want to make the change?

Ankit Bansali • 35:10

And Rich, one thing AI is good at is analyzing text, right? And with the skills of having actual network troubleshooting skills, that just blows up the whole industry right now because. What we just saw is we ran eight commands against each node. That’s 16 commands. We analyzed 16 commands. We tried to bring in our skills about network troubleshooting. After understanding the scenarios, it also understood what was broken from that perspective, which may not always be correct, but this is where we are going.

Ankit Bansali • 35:45

That we are not letting AI do everything by itself. I, as an engineer, I’m guiding it to do the things the way I’m confident with my understanding of the network. And this is where we really want people to think the value. And from an organization standpoint, you’re leveraging whatever you guys have built the way you have built scripts, APIs, workflows, whatever. But from a new technology agent perspective, I’m able to do this real fast, real quick. And now I can, to your point, this can be handed off to another engineer, which can now look at this and propose configuration, or you can ask AI to also propose configuration and fix it, like it’s mentioning in the next step.

Ankit Bansali • 36:26

So, depending on the flow you build in your maybe service now or any ticketing tool, you can now do a very graceful handoff. Because guess what I was told, Rich? Whenever an L2 looks at the ticket of L1, they re-troubleshoot the whole thing because they not always trust the engineers. It’s not because they do not trust it for that reason. It’s just it’s human practice of seeing things by myself and believing in what I see rather than somebody doing that. And again, network changes all the time. If you pass this ticket one hour later,

Ankit Bansali • 37:01

It might have changed already, right? So I have to redo the troubleshooting. Yeah. So this is where I think you can go faster by giving more confidence to the engineers.

Rich Martin • 37:11

Yeah, that’s an interesting point. Not only does this, so to your point around, like if a certain amount of time occurs, a long amount of time occurs before this gets into the hands of somebody who can actually fix it, then it almost requires me to do a double check. But if I can get this to you and it’s what we would say hot off the griddle, right? It just came off of the, you know, it just came out a few minutes ago. This is the most up-to-date information. And we trust the tool set that it’s using. That’s the key point, too.

Rich Martin • 37:39

I think a lot of times is maybe this person didn’t troubleshoot this the way that I would troubleshoot it. Maybe they’re not looking at it with the degree of scrutiny that I would look at it. And I should go and double check that again. But if you’re using a consistent set of tools, perhaps built by the network engineers, right, to do this troubleshooting and then relying on the analytical prowess of an AI, maybe we increase the degree of trust now when certain of these problems are being troubleshot through an AI plus the operations team, where we can start to go, okay, I do trust this over time. I do trust this more. It’s my tools. I know the process.

Rich Martin • 38:21

And not only that, these are things that are pretty common to occur in the network anyway, perhaps, right? There’s that aspect of it. So that does shrink the time to resolution, not just from a process perspective, but a human trust perspective between the handoffs between different teams.

Ankit Bansali • 38:38

So, Rich. So, going back to our setup here, it’s recommending a bunch of fixes, right? So, first thing I wanted to do is create the template, Ginger2 template, so I can understand what type of configuration it’s going to push on the box. So, let’s say I like the recommendation before you run the service which is applying the config leveraging itential apply config service on the device. I want you to create a Ginger2 template for this fix. Also, make sure you add conf T and write memory and end in the config. So, now what it’s doing is I’m keeping a record of the change I’m proposing in a Jinja2 file.

Ankit Bansali • 40:06

So, I understand that this is what we are planning on applying. And that way, we exactly know the fix we are trying to apply. So, excellent. I’ve created a Jinja2 template. So, let’s go take a look in the platform because this is where I’m trying to build trust with our engineers to actually have, you know, let AI build things, but you get full control out of this. So, if you look at this, this is what it’s proposing. Right?

Ankit Bansali • 40:37

Yep. So, it makes total sense from our understanding that this should be good from a configuration standpoint as a troubleshooting scenario. So, what we’ll do is we’ll allow it. So we don’t need to render the template because we already saw the template. Yes, so it finished the first job and it’s asking can I go apply this on the device? Yes, go ahead. So now it’s trying to apply that on the leaf one, right?

Ankit Bansali • 41:32

So we’ll see what the result comes back with. So we know the exact template that’s been pushed out is being applied and we have full confidence. And guess what? You would think that it would stop at just applying configuration to be confident as a troubleshooting agent, which is a good practice. You should always do your post-checks. Right. Right?

Ankit Bansali • 41:54

So guess what it’s doing? Now it’s checking if the configuration is applied or not. And if you look, that’s the response came back and it’s running through those commands and then checking if the trunk port is configured correctly. Now it’s checking whether the connectivity is in place for the right VNI. Like, come on, like, this is so much work that you would have to do behind the scene to get this right. And this is us talking in a one-hour session. But imagine you pretty much not demoing and doing this by yourself.

Ankit Bansali • 42:31

In less than five to ten minutes, you should be able to identify the issue, propose configuration, review configuration, and then apply configuration. And make sure it’s successfully applied by checking again. And guess what you’re going to do next, right, Rich? Since you already did and you restored the service, guess what? Please post this. And summary on the ticket for me to review. Right?

Ankit Bansali • 43:03

So, what we can do is ask it to go create a summary and post it on the ticket, and then we can go ahead and close the ticket, or we can call a flow that can close the ticket, right? I’ll create a convenience interview. Let me format this professionally. Oh, I didn’t say where, so I guess I need to say service now. So fast, so fast. Please post this and summary on the service now ticket leveraging the itential service. Yeah, so it’s looking for the services which are available.

Ankit Bansali • 44:13

So there you go. Ticket updates, it found that. So it’s looking for the workflow. I found the service now workflow updates here. Oh, so it’s not keeping memory here. There you go. This is again an incident of context window, right?

Ankit Bansali • 44:29

And in this case, it’s telling, even though I told it to do it before, it does not remember. So I guess I have to go back and give it the ticket number one more time.

Rich Martin • 44:43

Well, I’m willing to forgive one copy and paste for all the efficiency we’ve just gained.

Ankit Bansali • 44:48

I know, right?

Rich Martin • 44:50

But I think this becomes a point, too, is that’s interesting, too. Depending on how you want to leverage and what your organization is willing to do, a lot of this stuff could be up to the reasoning of the LLM, or it could be built into the workflow, or both, right? A fusion of both things. And so I can see being able to leverage a lot of your embedded ability to do troubleshooting and things like that within a workflow, but also, and maybe that opens up some context, some context reasoning there so it can hold those kind of things. But it’s interesting because we give the same flexibility previous to AI, kind of changing everything on the scene, the same flexibility of how we wanted to allow network engineers to build pre-checks, post-checks, all kinds of things like that. The same flexibility is now available on how you want to leverage AI to do those things to help augment your staff to become better at doing things like troubleshooting.

Ankit Bansali • 45:53

This is complete augmentation in this case, is leveraging the best tools at hand with the best potential capabilities. And look how beautiful this is. Like, this is a crazy detailed report, right? Absolutely. And it’s telling you exactly what it did, how it did it, how it proposed, where was the failure, the root cause was identified. This was a Jinja2 template that was associated for the fix which you were trying to push, and you can review the configuration. Like, I don’t think I have seen this much amount of quality work with the amount of time we spend on troubleshooting, especially on the documentation side, which is RCA review, and documentation is always like an afterthought.

Ankit Bansali • 46:40

Because, guess what? There’s already another ticket waiting for you. So, you don’t have that much time to do it. And now, leveraging AI in these kind of domains just helps the organization to track changes better.

Rich Martin • 46:51

Yeah, absolutely. I agree on all of those points, especially the one about documentation.

Ankit Bansali • 46:58

Man, they’re not.

Rich Martin • 46:59

And I think back to the beginning of what we were talking about, too, on one of the slides, is, you know, this is putting it in service now. This could easily be in a database that you create for LCA later that can also be leveraged by AI to determine how often something happens or, you know, or to get to the, you know, he came up with several theories of what could be wrong, but then it could reorder, reprioritize that based off of the past experiences of these types of services or that specific service by saving this in another data store as a as a, you know, as something that could be referenceable by AIs later. Again, making this whole process work much better, much more faster than it is by, you know, humans alone. So rich.

Ankit Bansali • 47:43

Mm-hmm. Documentation has always been an afterthought. RCA, because there’s always another ticket waiting for the engineer. But look at how beautifully represented that information is, especially in service now from AI building this report for us. It talks about the incident, it talks about the services and the customer that had been impacted and restored. It talks about all the other attributes around that services. It identifies the root cause analysis.

Ankit Bansali • 48:14

It does the troubleshooting process for you and tells you all the commands it ran to understand the root cause. And then also the resolution that it applied, which is basically telling you the template it generated and then applied it successfully. And then it did post-check. Like all of this super amazing, super well documented, beautifully represented, all done from an AI perspective. And now the organization can always go back in time and take a look at things and learn from it, especially if the reports are this beautiful and built so fast.

Rich Martin • 48:51

Yeah, I would agree. Now, seeing this, I definitely agree that I’ve never seen RCA reports as beautiful as this, at least this quickly done. You know, maybe if you give me some time, but to your point, like, I’m either wanting to go back to sleep because somebody woke me up and I’m not supposed to be working and I just fixed this. I’ll get to it in the morning, and I’ve already lost the details in my head, right? I’ve dreamt them away, or nobody really has the time to put, you know, even if it’s immediately after and I have time, nobody has the effort that puts this kind of effort into it. And yet, these are the things we need to make our organizations more efficient. And not only that, these are the things that AI is great at doing.

Rich Martin • 49:33

So we should be leveraging this as part of what we do going forward.

Ankit Bansali • 49:38

Absolutely. Absolutely. This is beautiful. Super impressed.

Rich Martin • 49:43

Well, that was an excellent example of basic troubleshooting, right? Obviously, this could go much further than that. But that being said, this is kind of more of like what is the art of the possible? Where we’re at in the world today in regards to AI and what we’re willing to accept is an interesting inflection point. And obviously, things are changing fast, but I’m not so sure organizations, especially the folks that you’re talking to, people who are really struggling over like what does this look like and how can we start to use this today without putting all the keys to the kingdom in the hands of something we don’t quite trust yet. We trust our people, right? But maybe we don’t trust this yet, but how can we leverage what it’s good at?

Rich Martin • 50:26

And I think to me, what you’ve shown us on Kit, and really this comes from your experience talking to prospects and customers, is this is the kind of thing, exactly the kind of thing they’re looking at getting into right now.

Ankit Bansali • 50:38

Yeah, I think there’s a lot of questions around how to leverage the technology. People are also very scared because networks are very sensitive and outages are very expensive. Correct. And security is also very scared because they do not understand how to put guardrails around things which is going to be that intelligent. So I think leveraging the itential stack with the gateways, bringing in the scripts, the Ansible services, with the workflows that gives you the flexibility of deterministic flows with nice reasoning capabilities. I think that just gives the ability for consumption, which is the northbound interfaces to the platform, leveraging MCP, is I can help teams build things. I can also help them consume things in the best possible way with the best guardrails you can find.

Ankit Bansali • 51:34

So I think that really brings everything home.

Rich Martin • 51:37

I love it. Thanks for summarizing everything so well and showing us and walking us through exactly how this can be done in our platform. And with that, I think we’ll end the webinar. Thank you again, Ankit, for not only joining us, but doing such a great job in building and explaining how all this fits together.

Ankit Bansali • 51:55

Absolutely. My pleasure. Thank you.

Rich Martin • 51:57

Thank you. And thanks, everyone, for joining. And we’ll catch you later. Cheers, peace.

Filter

Sort By

Itential Platform

Solutions

Resources

Partners

About Us

Reduce Network Troubleshooting Time with Itential MCP & AI

???? In this demo, you’ll learn:

Demo Notes

View Transcript

Watch More Itential Demos

Stay in the loop with Itential.

Filter

Sort By

Reduce Network Troubleshooting Time with Itential MCP & AI

???? In this demo, you’ll learn:

Demo Notes

View Transcript

Watch More Itential Demos

Related Content

Podcast

Infrastructure Operations Automation Is Here, Thanks to MCP and Agentic AI

Blog

Getting Started with Itential MCP Server

Demo

Itential MCP Demo | Trigger Network Automation with a Simple Ask