Rich Martin • 00:00
Hello and welcome everyone to another Itential webinar. Today is going to be a special and more interactive webinar. As always, my name is Rich Martin, Director of Technical Marketing at Itential. And today we will be talking about how to integrate Kentik and Itential for automated closed loop remediation of infrastructure events. And I promise we’ll break this down because I have the help of my colleague and friend, Leon Adato. Leon, give us a little background on yourself so that everybody knows who you are.
Leon Adato • 00:30
Sure, well, first of all, thank you for having me on. And yeah, we will make this as interactive and fun as we possibly can. My name, as you said, is Leon Adato. I am a Principal Technical Evangelist at Kentik. What we do is network monitoring, network observability. Our goal is to help you ask basically any question about your network that you can possibly think of in the easiest way possible. Myself, I’ve been working in tech
Leon Adato • 00:56
Oh, gosh. For 35 years, for those people who remember, that’s back when you could get Windows 286 for free on 12 5.25-inch floppies. And out of those 35 years, I have been working in monitoring and observability for 25 of those years. So, I’ve used everything from Tivoli and OpenView and Nagios and janky Perl scripts all the way through to the newer observability tools and janky Python scripts and things like that. So, observability and monitoring is kind of my jam and definitely the network side of things as well.
Rich Martin • 01:31
Yeah, that’s awesome. Well, thanks for that. So let’s get started and break down this topic. Let’s start with what is the technology we’re talking about? Now, you said network observability and you’ve given us a really great purview over the history of that. But I’m going to let you spend a little more time here, breaking this down so that everybody can understand what we mean by network observability.
Leon Adato • 01:54
Sure, because there’s a lot of people who hear that and they’re like, that’s an oxymoron, that’s not really a thing. It really is. I’m simplifying massively. So for those people who fought. and perhaps lost blood in the observability meme wars on Twitter or whatever, I give my respect and I am not trying to minimize any of that. But simply saying that, monitoring is more focused on the known unknowns. We know what’s gonna go wrong, we just don’t know when versus the unknown unknowns. We do not know what’s gonna go wrong and we do not know when.
Leon Adato • 02:32
And usually the reason for that is because you’re dealing with a metric butt ton of data and there’s simply no way for a human brain to reason about that much information. And that’s where we leverage machine learning. I would never say AI because it’s a BS marketing term. But anyway, you use machine learning to parse through that volume of data and understand the things that are happening that you might not be able to see with your own eyes or even know to ask about. And then using that essential insight from the network layer, which is sort of very traditional and has been there from the dawn of computing almost. And to understand how everything at the top layer, the applications are really interacting and to provide even more information. So that’s what we mean when we say network observability.
Leon Adato • 03:22
And I will say that networks were possibly the first observable layer of computing because it would tell you how it was doing without being asked in the form of messages and traps and things like that. So I’m going to stand here on the side of the network and say that we were observable first.
Rich Martin • 03:40
I’m not going to argue that point as a longtime network engineer and nerd myself, I agree. So now let’s talk a little bit about kind of what Itential does, which is network automation and orchestration. So let’s, I’ll talk a little bit about that. And so similar to your journey, network automation has had kind of a rocky past, but what we’re seeing now is really things starting to get really good. And the way we see it at Itential, we look at automation and that’s where I think a lot of folks are right now is as they’re, especially in large complex networks with multiple domains and multiple technologies. There’s the, not only the need for automation, but a lot of different tools for automation. And what we typically see is automation is really comprised of individual task focus changes, at least initially, right?
Rich Martin • 04:32
And initially it’s a network engineer with a little bit of a bravado saying, I’m going to figure out how to pick up Python to write janky Python code. No, maybe initially janky Python code or Ansible to write playbooks. And, you know, I’m going to take this stack of, you know, 20 different CLI commands that I have to do X amount of times a day, and I’m going to figure out how to automate that piece of it. And it’s really around making network changes, usually to traditional network routers and switches using CLI. That’s a great first start, but it really is just the first start of this entire technology. And we see it as a journey, really, right?
Rich Martin • 05:10
So as you evolve from that, now, perhaps it’s more than just you, maybe now you’ve identified other colleagues on your team or maybe hired some folks on the team that are now building more and more of these automation tasks. And it could be using a variety of different tools because, as you know, in the world of networking, there’s a lot of silos. The data center team and the SD-WAN team are using different tools, right? go across the silo, even cloud, which has a lot of networking in it, even though it’s the cloud, you’re starting to see a lot of networking engineers get involved in the cloud, especially at scale, when you really need to start treating those virtual cloud networks more like traditional networks with BGP and routing and things like that. So you’re starting to see all these tools develop. So now the question is, how do we tie all those tools together? The assets, the automations built with all these tools.
Rich Martin • 06:06
So if I’m using Terraform for the cloud, if I’m using Python for data center automation, if using maybe even in the SD-WAN world, they always come with controllers that are tied strictly to those solutions. So now I’m using a controller on the SD-WAN side. How do I tie all of those things together? Because in order to bring up a service or application, or even give access to one of those things, I might have to touch multiple domains. of networking and make changes across all of that. So now we’re talking about orchestration. How can we now look at tying not only those automations, but all of the processes leading up to making network changes.
Rich Martin • 06:43
And this is the fun stuff, like opening up ServiceNow tickets and filling them out with complex pieces of documentation that come from five or six different places. You know, all the fun stuff of network engineering, filling out documentation.
Leon Adato • 06:56
Paperwork, always the paperwork, Wachowski.
Rich Martin • 06:58
Paperwork, right? And this becomes part of what orchestration looks like. So you go from network automation to orchestration, getting all of those end-to-end processes and integrating all of those different systems together. And that could be data gathering processes, like going to your NetBox and pulling up, you know, the IP address that I need to make the network change. So all of that goes into it, as well as integrating with other network observability and different systems like Kentik. So that’s what we do. And then I think the last piece that may be overlooked when it comes to automation is ensuring configuration compliance so that you’re making all these changes through automated means.
Rich Martin • 07:35
We need to ensure that we’re not breaking any best practices or maybe any specific policies that need to be in place. So we need to maintain configuration compliance so that we have the highest level of security and performance, at least from the perspective of making changes to the network.
Leon Adato • 07:52
Right. Yeah, that consistency and the assurance that everything is happening the way it’s supposed to be happening and nothing is breaking. I’m going to say right up front that network engineers have been really, really skittish about the concept of automation for a really long time. And I think part of it is not so much, we’re worried Skynet’s going to take over everything. It’s not quite that, but it’s more, I don’t know if you can see in the background, but I’ve got a thing from the Sorcerer’s Apprentice from Fantasia, where this apprentice had this one spell and cast it, and it just went completely off the rails. And before they knew it, they were completely drowning in unintended automation. And I think that network engineers have a vision of that Sorcerer’s Apprentice moment where they create automation that’s supposed to go off and do all these things, and then just keeps on mindlessly going off and doing it and breaking everything along the way. So knowing that Itential has the ability to validate and stop and have all these logic gates for it is the assurance that a lot of us are looking for. Absolutely. Absolutely.
Rich Martin • 08:58
All right. Well, I think as you and I have spent some time together and talked, this bubbled up as a common discussion point for both of our technologies and both of our markets. So I’m going to pose the question to you, and then I’ll take the question for myself. Why not network observability? Why not just build it yourself?
Leon Adato • 09:17
Right, so I will also step back and say that this is the question that we who are IT practitioners ask about everything all the time, build versus buy. It’s always the question. And in some cases, it is an immediate question. Right now, do I need to build it or right now, do I need to buy it? That is my choice. But I think that far more often, it’s not a question of whether or not, but when. You talked about it.
Leon Adato • 09:41
You said, you know, you create a few janky Python scripts and hopefully they become less janky over time and more sophisticated. But at the beginning, is the wrong time necessarily to say, oh, I’m going to buy a whole automation orchestration solution or I’m going to buy a whole, you know, monitoring and observability solution. Sometimes, you know, a couple of pings every once in a while is really all you need to validate the thing that you need to validate. Now, I will say that with monitoring and observability, that hockey sticks pretty early on, that ping and that trace route and that kind of stuff is simply not enough. And it begins to be not enough almost at the beginning. You know, once you have more than three routers that you have to keep track of, it’s over. You need a monitoring tool pretty quickly.
Leon Adato • 10:28
But with automation, that curve may happen a little bit later down the line where, you know, you get by with a library of scripts and snippets and shortcuts and key bindings and things like that until you can’t, until you don’t, until you have too many people who are making too many edits to those scripts or have their own versions of it and nobody knows who ran which version. In the monitoring space, you’ve got multiple different people running the same command on the same boxes until finally you have what I call observer bias, where you’ve just pinged the router to death because, you know, you’ve basically done your own human-based DDoS attack on your own equipment because you have, you know, 15 network engineers who are all running SNMP walks on the same router multiple times or whatever. So, you get to a point, when you get to that point, and say, I can’t keep maintaining and building it myself, I have to go on. And that’s, you know, that’s the case for monogram observability. It’s when you suddenly realize that not only do you have this neat idea and you’ve written a couple of scripts, but you own those scripts, you have to keep maintaining those scripts, and now you’re maintaining those scripts or those automations for other people who are asking for things. That sounds remarkably like a feature request. And then you have to keep versions, and now you have a database of past results, and then you have to build a dashboard, and long before any of the stuff I just mentioned, you’re ready to buy.
Leon Adato • 12:06
You know, that’s why you don’t build it yourself, because you have better things to do. I will also point out for people who are career-minded, that when you become irreplaceable, meaning that those scripts are yours and you’re the only one who understands them, that’s fantastic. It feels really good, right? You’re valuable. But irreplaceable is also unpromotable. You will never have another job. and so that also is a reason not to build it yourself, or not to build it and try to build it up yourself.
Leon Adato • 12:36
Use it until you validated the use case. Use it until you see that monitoring and observability is the thing you need, and then also you will have the insight to know what you need out of a monitoring observability tool, and you’ll be able to make an informed choice. That’s on the monitoring observability side. Rich, I’m going to ask you, I hinted at it a little bit, but for network automation and orchestration, why wouldn’t I just, look, Chef and Puppet and Ansible, those are all things I could just use free software and make it all work myself. Why wouldn’t I do that?
Rich Martin • 13:13
No, it’s a different song, but the same story. It’s about what the organization is looking like, the skill set. It’s very different in some ways, but I think it works until it doesn’t, like you said. So where that point is where it doesn’t work, really what you’re talking about is solving for things that we didn’t anticipate for in a lot of ways. So for instance, a lot of network automation starts very organically. It’s all about, hey, I’m going to pick up Ansible, or I’m going to pick up Python, I’m going to start doing stuff. This is awesome.
Rich Martin • 13:45
And then the manager of that particular network engineer says, fantastic. Share that with your team. And then it’s like, uh-oh, now I’ve got to solve for this. And there are a lot of ways to do that, valid ways to do that. But then there’s never just one thing you have to solve for. How do I share this? How do I version?
Rich Martin • 14:03
How do I update? How do I secure it? How do I audit all of these scripts? And then over time, what they tend to find out is they’re building a platform. And then this is just for the automation piece. When you get to the orchestration piece, now you’re talking about building integrations to all of the systems that you have today that are part of an end-to-end process, right? Kentix, ServiceNow, Teams, Slack, whatever’s part of that process.
Rich Martin • 14:31
And the process keeps moving, like growing too. It’s like, oh, we could add this, we should add that. Now that you’re automating all this stuff. Will those things change over time? Not only at the API level, but whole systems go away. We were using JIRA, now we’re using ServiceNow. Now, in a couple of years, we’ll be using something else.
Rich Martin • 14:49
So that’s always a moving target. And what ends up happening, like you said, similar to the network observability side is when you do it yourself, there’s going to be somebody that has to maintain this. And if they were a network engineer initially, taking that bold step to learn something to automate, they now find themselves very quickly building and managing a platform. And that’s not necessarily something they want to do.
Leon Adato • 15:10
Right. You know who doesn’t want to be a dev? Network engineers. They never want to be a dev. They never want to grow up and become a developer.
Rich Martin • 15:20
So it’s a very similar thing. Obviously at Itential we, you know, we want to take all of that network automation using whatever tools they want. You know, one of our philosophies is, you know, our customers know what the right tools for the job is when it comes to automating tasks. They’re all going to have different tools and different silos. Ultimately, you’ve got to solve the problem of sharing that stuff and, but then, and then orchestrating it. And then ultimately something around self-service is really where this starts to end up in. How do we now take all of this and instead of sharing it with just a group of very technical people, whether they’re networking folks or other IT technology domains, but who, what about the end user, right?
Rich Martin • 15:58
The end user is ultimately what we should be allowing to click a button and order something whether that end user is internal or truly an end customer, but ultimately that’s what you want to do. And getting from building task-based automations to that point, there’s a lot of ground to cover. It can be done in a do-it-yourself way, or it can be done with platforms particularly, or it can be done using both. So we just want to, you know, bubble that up and allow our customers, much like yours to understand where those things are.
Leon Adato • 16:27
Exactly. Very nice. Okay, so.
Rich Martin • 16:31
Let’s now talk a little bit about what we both bring to the table as far as our solutions, and then we’ll talk a little bit about how combining them is really awesome. On the intentional side, obviously, we’ve been talking about network automation orchestration, automating changes across all of your infrastructure. As you have solutions for these different domain infrastructures, SD-WAN always comes with some solution that does some automation in a controller through a GUI dashboard, probably has APIs. That’s where we can now leverage that type of tool that you’re using for automation in that domain, along with your Python, your Ansible, or whatever else you have in all these different domains to be able to automate and orchestrate changes across all your infrastructure. Codeless integrations to any of these systems, we have a unique way of providing the customer the ability to generate an integration whenever they want without any additional cost so that they can automate more of their network and orchestrate more of their infrastructure. So we do that simply by leveraging open APIs or any kind of API documentation specification. Ideally open API or Swagger spec makes it very simple to drag and drop that file into our platform and then generate an integration to your system or network controller or whatever it may be.
Rich Martin • 17:50
You talked about making sure network engineers have confidence or assurance that things are gonna go right when they click the magic button that starts to automate things. So we want to make it very easy for them to implement robust pre-checks, post-checks and config compliance so that they can have that level of confidence. And then ultimately we wanna be able to integrate with other intelligent platforms like Kentik so that we can do this secure closed loop automation and orchestration, which is really ideal when it comes to a lot of things like generating alerts for things that need to be responded to very, very quickly, right? Right. And then ultimately delivering self-service networking and IT to end users, whether those are application teams or even the end customer themselves depending on the nature of the business you’re in.
Leon Adato • 18:42
Right. Over on the other side of the slide, the network observability and monitoring piece, what Kentik brings to the table and addresses in all of the challenges that we’ve just been talking about. First of all, visibility to all your networks, whether it’s on-premises or in the Cloud, whether it’s a single Cloud or multi-Cloud or multi-provider. If you have data moving in a particular direction, Kentik can see it and tell you about it. Also, a view of all your telemetry, which is a fancy word that means data. So, you know, the ability to see all of it, and I think more importantly, to see all of it in context, to understand what is happening, you know, the cloud-based data, which has a particular format and a particular view of the world, but also your on-premises data, and to understand all of it so you can line it up. You can say that, oh, this thing happened, or this data is moving in these directions or whatever it is, but understand holistically what’s happening. The ability to query with context, and I’m just going to give away the bottom line here, ask any question, any which-a-way, about any what-a-thing that you have, just, you know, the ability to ask the question about the data that you are collecting and sending to Kentik in any way to let you know, and, okay, I apologize, there’s that word AI, we all understand that it really just means, you know, three if statements in a trench coat, but that’s okay, it’s, you know, machine learning, AI, whatever makes you happy, whatever makes my marketing team happy, it is insights that are driven by a machine that can reason about vast quantities of telemetry or data, and bring to surface things that are unique.
Leon Adato • 20:23
The idea here is that by watching all the data that’s coming in, and creating baselines for that, that’s what we’re doing. So, thank you very much. Bye. Bye. you know, seeing what normal looks like, what does normal look like on a Tuesday afternoon in December, you know, for these systems, and then being able to say, all right, but I just saw something that’s not normal. I just saw something that varied away within this time, or this context, or whatever it is. I’m going to bring this to your attention and say, maybe this is interesting, maybe this is useful, maybe it’s nothing, who knows. But there was no way for a human to get to that view. And so, the machines, again, machine learning, whatever term you want to use for it, is constantly analyzing the data flow and bringing it to the attention of the human to then make a decision about. So, that’s really what Kentik is all about. Whether it is your flow-based data or your metric-based data or any of the other stuff.
Rich Martin • 21:19
Fantastic. All right. Let’s talk about some of the use cases when you combine the power of Itential and Kentik together. Where we start off usually is around use cases on onboarding and admitting network devices. When it comes to deploying that new network devices, there’s a lot of day 0, day 1, those are the terms we throw out there. It’s really what’s the baseline configuration, the starting configuration just to get it up on the network just so it can be managed by another team. A lot of our customers, those are some of the first use cases they’re looking at, especially if they’re deploying something that new, something greenfield. How do we quickly configure things and push configurations to devices and then ship them off or rack and stack them into an environment?
Rich Martin • 22:09
But for the Kentik side of things, this becomes important because sometimes what’s overlooked is the ability to insert that device once it’s inserted into the network and inserted into actual service. being monitored and observed.
Leon Adato • 22:23
Right. Anybody who’s monitored anything ever knows that there’s usually a configuration change that has to happen on the network device itself, either an SNMP community string or a NetFlow information, or back in the day, IPSLA configuration for elements or whatever it is. Even if you’re not configuring that device, what you have to do is configure something else to allow the monitoring to occur, to allow the data to flow. A lot of times there’s a firewall or two or seven or whatever, that are in between the monitoring system and the device to be monitored, and you have to allow the communication, that data to flow, and so that requires a change. This example is beginning at the beginning, that when you bring something, when you put something on the network, either it needs to be reconfigured in a particular way to allow and facilitate monitoring and observability, or something else has to be facilitated in between to allow that data to pass. The network automation means that a human doesn’t have to be a code monkey, an iOS CLI monkey, and type these things in and then fat finger them and get them wrong and oh my gosh. There’s also a piece, and I know I’m jumping ahead but not really, which is what happens when things change? Oh my gosh, we have to change the SNMP community strings.
Leon Adato • 23:46
Oh my goodness, we want less NetFlow messaging. We want more NetFlow messaging. We want different kinds of data to be moving. How do I go back to my entire fleet of network devices, 100, 1,000, 5,000, 10,000 devices, and how do I make sure that that is done consistently across the board and that it’s done for all the new devices going forward? Because that’s something that can really kill your monitoring observability tool is the need to change and then a couple of devices don’t or something in between doesn’t or whatever or doesn’t happen fast enough and you end up with 927,000 tickets, which I may or may not have seen in my career. So those are all the things that Kentik cares about when it comes to that, again, consistency of deployment that we’re talking about.
Rich Martin • 24:39
Yeah. And that’s a great point. It’s deploying it initially, making sure it’s on the network, but also being observed by Kentik. So you need both configuration on the device itself to be updated to reflect that. And then Kentik to be aware of the device that exists and it should be monitoring. So that’s something that we’ve done with you for our customers to be able to do that. In some cases, it’s to synchronize what’s in that box with what should be in Kentik as well, as well as putting those configurations, not just putting it in the Kentik and onboarding it there, but putting those in configurations and so that Kentik has access to all that data from those devices.
Rich Martin • 25:16
And like you said, networks change over time. That’s the nature of those changes. And those changes should be things like updating SNMP security strings and things like that as part of proper protocol and discipline around security. So a lot of times though, those changes are made and other things are not updated, which can break your observability. So that also ties into with ITENTIAL being able to do those golden configuration templates so that you can build a rule set that says this complies to this. And that part of that compliance is you should have an SNMP string that matches this. And part of the automation should also go back and say, and it should also be set up in Kentik correctly as well.
Rich Martin • 26:04
So those are some of the automations that you can tie together with Kentik and Itential to really get you jump-started and observing the network, having a network that functions and having a network that’s reporting back to the brain on how it’s functioning. Which then kind of leads us into more of the day two plus scenario of networking is kind of what’s going on now that it’s running, now that we’ve got these services are running or we’re adding new services or removing services, i.e. making changes to the network, right?
Leon Adato • 26:38
So what are some of the things that happen there? Never. So yeah, sorry. No worries. So yeah, I think that… What’s interesting is, people often hyper-focus on the first use case that we gave because it is the first thing they try to do with network automation and also monitoring and observability. But the second use case is the one that is far more impactful.
Leon Adato • 27:01
It also highlights the need for both, right? Both Kentik and iTential. I need the automation and orchestration, but I also need the monitoring because we’re It’s not just make this change, but make this change and then validate that the change didn’t hurt anything and that the change fixed the thing I meant to fix. In the second example, we’re responding to a network event. Kentic is actually the beginning, the tip of the spear in this particular case, where it sees something go wrong. We were prepping for this webinar and I was joking that every network situation can be boiled down to something couldn’t reach something else.
Leon Adato • 27:42
The data started here and went there and it couldn’t make it, and there’s the network, the end. Something tried to contact something else and either could or couldn’t, and now we have a problem of some kind. In the case here, AWS has blocked traffic alerts. But whatever it is, Kentic is the thing that’s going to notice that the traffic isn’t going where it needs to go. Then it’s going to throw up a flag, meaning an alert, and say, this is wrong. Now, at that point, automation kicks in, go fix it. But did it?
Leon Adato • 28:15
Did it fix it? I think that any of us who’ve been in IT and tech for more than 15 minutes, have been in a situation where we get the message, the e-mail, the page, I just dated myself, the text message on our phone, the whatever it is, and we respond to it and we jump on our computer, and we bang out some new lines of code or some new configuration elements, and we’re like, all right, there we go, it’s fixed. Fifteen minutes later, we get a call saying, it’s still down, what’s wrong? Like, no, I fixed it. No, you didn’t. So the Kentic is part of that feedback loop where something happened, we throw up an alert, we engage the automation, meaning Itential. Itential goes and does a set of steps, but it’s now Kentic that will validate that those steps actually worked, that actually fixed the problem or if additional steps are needed. So that’s a little bit about this one.
Leon Adato • 29:13
I don’t know if there’s anything, Rich, that you want to add to it.
Rich Martin • 29:15
Yeah, no, I would just add on that this is where, you know, the wealth of use cases is just going to start flowing like crazy, right? Because it hits everything from, you know, security, right? Oh, we have to respond to a security event. Well, getting the, you know, responding as quickly as possible, probably a big deal, especially if it’s a very critical security event, right? So this is where automation, so the intelligent, the intelligent observability and alerting, along with the intelligent automation being tied together in a closed loop way, helps to respond very quickly, but also correctly. The other piece of this is, in this case, it could be opening up a service that shouldn’t be blocked without having to involve additional folks just doing stuff like clicking buttons on a dashboard, like in a GUI dashboard. But it also could be looking at things like the statistics around bandwidth utilization on some critical connections on the network and being able to use the Kentik platform to determine, hey, we should be doing something here.
Rich Martin • 30:21
Well, what is that something? Well, that something is defined as an automation that was written by the network team to handle this case. And now you can go directly from Kentik saying, this is something we need to look at to now this has been responded to and tested and validated.
Leon Adato • 30:36
Yeah. Two points I want to bring up. One is there is nothing as powerful as being able to respond immediately at the moment of error. And the only thing that’s right there at the moment of error is you’re monitoring an observability tool. It is right there. It is witness to that exact millisecond when things went wrong. And if in that exact millisecond, it can then engage and do a prearranged set of things. There’s nothing as impactful as being able to attack the problem immediately in that moment. Nobody has to move from one screen to another.
Leon Adato • 31:14
Nobody has to get out of bed and throw on their fuzzy bunny slippers. Nobody has to deal with it. It’s right there. The other piece, though, you know, we talk about taking action. But I want to highlight that sometimes the action is information. that the information can be as or more powerful than just issuing a particular command, saying, hey, this thing happened, go out and gather these other pieces of insight and then add them into the ServiceNow ticket, add them into the stream and combine those two ideas. Try this, if the problem isn’t resolved, so now you have this feedback loop, right? I see a problem, I try some automated action, I double check with my monitoring tool, it says, nope, problem persists.
Leon Adato • 31:59
Fine, I’m gonna gather some more information about what went wrong, I’m gonna inject that into the ticket and I’m gonna try the next step. And then I wait and automation, and then the monitoring tool says, nope, still broken, not a problem, I gather more information about what’s going on and I try something else. And you keep on doing that until either the problem is fixed or the problem is fixed. you’re at the end of that logic chain, at which point the human who’s responding has this entire list of things. I tried this and this was the result. I tried this and this is the result. I tried this and this is the result.
Leon Adato • 32:31
So they are able to jump in on the problem with a rich and valuable history of information of what was going on. And they can then use their human brain for the things that human brains are best at, coming up with the next creative step that no automation could have foreseen. And that’s incredibly powerful too. And again, it all speaks to why it’s not an either, like, well, why don’t I just use Kintic and I don’t need anything else? Or why don’t I use Itential and I don’t need that? No, you need us both, really. And I’m not just saying that because we’re on a webinar today.
Leon Adato • 33:02
Really, you need both the ability to take action and the ability to gather insight.
Rich Martin • 33:07
Absolutely. All right, so now as we start to round the corner into DemoVille, let’s talk really quickly about how we generally work together, and then we’ll talk about the demo overview and we’ll get right to it. Sure. So give us the first two steps on the Kintic workflow here.
Leon Adato • 33:23
So it’s a lot of what I just described, right? So the observability and monitoring is listening and looking at the network. I will, once again, channel my inner Pixar and Monsters, Inc. is always watching Wachowski. So it’s always watching. It’s always looking at the network and again, seeing what’s normal and seeing what’s not normal. And in this particular use case, it’s, you know, policy alerting, it’s triggering based on the fact that some policy, something tripped a threshold. Something didn’t work the way that not only, it didn’t work not only in a way that we didn’t expect it to not work, I know that’s a triple negative, but also in a way that we explicitly said if it does this, I need to know about it.
Leon Adato • 34:08
I need to do something about it. And once it has that information, both the what happened and the who it happened to and the when it happened, it’s able to pass that along to my potential to pick up from there.
Rich Martin • 34:19
Yeah, and then from the Itential side, as automations are executed, in fact, this is an orchestration, it’s more of an end-to-end process, but we need to ensure that you’re validating not only technically, but also the fact that you’re adhering to some business rules as well. A lot of that has to do with policy, documentation, all the fun things, like I said, of networking. Compliance, again, is about making sure that whatever changes we make don’t break any kind of standard, which is going to have a reverberation into other problems normally, performance or security, and then actually making the change itself across all the different types of infrastructure that may require a change to happen in order to bring up some sort of application or service. And then finally, passing it over to Kentik for this last step here.
Leon Adato • 35:09
Right. And the validation step, this is what we’ve been talking about, is I made a bunch of changes to fix a problem, did I? Did it, is it fixed? And if the answer is yes, fantastic, if the answer is no… There’s something else. Now, in some cases, we made a configuration change, it actually made things worse or it didn’t help, fine, roll it back or get a human involved or whatever it is. But it’s that validation that here’s the state of the environment now that we’ve done these things.
Rich Martin • 35:42
Fantastic. Well, let’s roll into what we’re talking about today in regards to tinkering under the hood of both Kentik and Itential and seeing them work together. In this case, we’ve just set up a demo environment to illustrate how you can utilize both these platforms working together to respond to a particular event. In this case, we’ve got a web server that’s hosted in AWS. You’ll see here very quickly that it doesn’t have any inbound rules which means nothing can get to it. If I have access, I’m going to try to access it from my home with my particular IP address, and we’re not going to be able to get to it. Normally, if I should have access to this, if this were a business environment, I’d have to open a ticket, call IT, they’d have to find somebody on networking or maybe in Cloud because they got to figure out if it’s a network or a Cloud issue and determine what my IP address, and there’s this long drawn out process just to get me to get access into this particular resource, this web server. In this case, we’re going to utilize Kentic to monitor the AWS Flow Logs and then what’s going to happen.
Leon Adato • 36:47
At that point, it’s going to see that there’s no traffic, it’s going to see that nothing is working, and it’s going to identify the specifics. Like I said, it’s going to identify, this is the box, again, the web server that’s not getting traffic, that’s not normal, that’s not okay, and it is now going to throw an alert in Kentik, but the alert action is going to use an API call to trigger Itential. It’s going to pass the who, what, where, when as a JSON payload into Itential to pick up the ball and then, again, make a couple of changes.
Rich Martin • 37:26
That’s right. And so on the intentional side, it’ll launch a workflow that’s been built by a networking team to be able to not only respond to the event, but determine the source IP from all that data that was passed from Kentik. And all that data is important, but in some cases, we would need to extract that data in order to make a network change. So in this case, we want to not only make the network change, but because this is more of an orchestrated process, we want to do the pieces that aren’t network focused as well, which is let’s open up a ServiceNow ticket, update those tickets with information. Also talk to the notification system with MS Teams to document and alert the network team of what’s going on, because a lot of times as you start utilizing more and more network automations, it’s all about oversight, right? Oversight into what’s going on with these automations. And then finally, to actually make that change to the AWS security group. With that, let me flip over and share my screen here. We’re going to start off with AWS. This is a security group. Remember, we have a web server set up that I should be able to access, but I don’t. The reason why is because there’s no inbound rule here that would allow me access from my IP address. There’s no rule. The idea here is that I’m going to try to access this.
Rich Martin • 38:53
I can’t access this. In fact, I’ll do that right now. So, if I try to hit the public… That was from my previous test. Right now, you’ll see I’m spinning. It’s cached. So, I’m spinning, and I’m not able to get to it right now because, again, the AWS security group doesn’t exist for my particular IP address. And so, what this is going to do, though, is it’s going to generate a flow log in AWS that Kentic has configured to observe and do something with.
Rich Martin • 39:29
So, let me flip over now to the Kentic platform, and I’m going to let you, Leon, walk me through some of the sections here, especially around how this is set up to trigger an alert and then launch the automation on the Itential side through an API interface.
Leon Adato • 39:45
I’m going to start off and I just want to provide some visual information about what Kentik sees and how it sees it. There’s some really cool visualizations. If you’ll go up to what we affectionately call the hamburger menu up in the upper left corner, and if you’ll go to Kentik map, the second item down on the left there, and click on the little Amazon icon there under clouds, since that’s the only cloud we have, and show view topology. This isn’t a fake topology, this is a real topology, and it’s showing me everything that it can see. It can see the customer gateways, it can see the VPN connections. If you just hover over that customer gateway in the upper left side there. Now, again, cute animation, but what it’s showing is that data is flowing out of US East 1 and all of those VPCs that are down there, all those objects down there, and it’s flowing out to the customer gateway.
Leon Adato • 40:40
What I’d like to do is go to under US East 1, I want to go to the Shared Lab. Yeah, go ahead and click on that. We can see the objects that are inside the Shared Lab, and I want to click on US East B1, and a little down arrow that’s to the right of US, the Shared Lab there, yeah, that down arrow, and do show connections for me. Okay. So, again, what I’m seeing is that traffic is actually traveling in both directions to and from the internet gateway, but it’s only traveling outbound from the lab. There’s no traffic going in to lab, which is exactly what we’re seeing. It’s actually not able to pass traffic from that gateway in there.
Leon Adato • 41:29
Now, that’s a lot to infer from a bunch of squiggly arrows that are flowing around the screen, nor do I ever expect that any user is going to be staring at a screen waiting for a bunch of squiggly lines to stop. My point here is simply to illustrate that Kentik is watching the data and even able to show you the flow of data to, from, in, out as it goes. But really what we care about is the actual flow and the actual situation. So, if you’ll go back up to the hamburger menu again, and go to network explorer right there. Perfect. This is showing us all the traffic that’s happening. Now, again, this would be on-premises and in all of the clouds that we are monitoring.
Leon Adato • 42:19
You can see that two of my devices are incomplete. They’re missing flow data. So, obviously, that’s something that we would need to do and would look to Itential to help us do more consistently without having to get a human involved. But in this case, again, we know the problem is US East 1. So, go ahead and click on US East 1 down below. Open that out. And we can see the flow of data in, out.
Leon Adato • 42:40
And what I’d like to do is click on the devices. Down in the middle there, there you go. So these are my two VPCs and it’s showing me the inbound and outbound traffic. And I can see that that outbound traffic on that second VPC is very minimal. So go ahead and we’ll click on that just for the last step here. And this is showing me where the traffic is coming in and out from. And again, all I’m doing is showing the visualization of all of the information that we have going to and from the environment.
Leon Adato • 43:19
But that’s not the trigger. That’s not the error that is simply showing you that the data is moving in and out. I just wanted to be able to show that. Really what we’re talking about is I want to know about the alert. I want to know about what triggered and when and why. So one more time, back up to the hamburger menu in the upper left corner. And over on the left hand side, click on alerting.
Leon Adato • 43:44
just the way you think it was. And here we can see the different alarms. Now, what I need you to do is go in and turn on the cleared alerts, because we actually have alerts that have already gone through and cleared. These are the ones that we’re talking about. The policy, which is Kentik’s term for an alert rule, the policy is US policy rejected web server traffic. Since we know the problem immediately right now, we know the problem is the web server isn’t getting traffic, that makes sense. You know, this is the policy that has been violated, and I get a view of what and where and when that has occurred. If I want to see what the policy is all about, I can go up to the upper right corner and click on manage policies. And I see I have only one policy enabled, and that is that web server traffic. If I want to see more about it, I can click on that one to expand it out. And what we’re looking at, and this is where I’m going to stop, I’m not going to get into the nitty gritty of Kentik, that’s not what this webinar is about, is that the data elements that I’m thinking about, not the data elements that we are collecting, we’re collecting a whole lot more than this, but the ones that I care about for this policy are the source country, the AS number, the IP CIDR source, as well as what firewall actions are occurring, the destination IP and CIDR, the destination security group and the port number. Now, that’s the information I want to know about that I’m able to include in a ticket or in the trigger element that we pass to Itential. But there’s also a filter, and that’s what you see below. The firewall action is REJECT.
Leon Adato • 45:26
That’s the one that I want to know about. The IP CIDR range is the 10.0.18 slash 24 range. The country is only the US and the destination port is only 80. Now I have narrowed down that broad amount of traffic that could be coming from any country and any IP CIDR and so on and so forth. I’m narrowing it down to say only the ones that match this, and really the key one, of course, is the firewall action equals reject. When you see a rejected firewall rule that fits within these other things, now that’s a problem. And I’m going to check to see if this has occurred every two minutes.
Leon Adato • 46:01
I’m going to evaluate 22 rows at a time and so on and so forth. Again, I’m not going to dig too deep into this, but that’s what’s going to trigger this policy, pass the information along to Itential for it to pick up and go do some stuff. So Rich, what is the stuff that it’s going to do?
Rich Martin • 46:19
Fantastic. Yeah. So based off of that, kind of that intelligent observability and the ability to trigger those alerts, that’s mapped back to an API call into our platform, which would run this automation. Now there’s a lot going on here, and this is where we’ll step through this in a little more detail. But the first thing is… Along with not just calling this particular workflow in the Itential platform, and by the way, we’re an automation studio where you build automation, so this is the automation that will be run, but this is where you can actually build it and create these kind of automated workflows and orchestrated workflows. So in this case, we are going to grab a lot of information from that Kentik call into our platform, right? So all of the information that Leon was talking about gathering is actually a lot of that is passed back into this API call because that’s important. There’s a lot of information there and we want to be able to utilize that when necessary.
Rich Martin • 47:20
In this case, we want to know what the source IP address because what we want to do is we want to compare it to a list of allowed source addresses. And this is my IP address right here. So it’s allowed to have access into AWS for this particular web server, but the infrastructure, the AWS security group doesn’t have this in its list. So this is what this particular workflow needs to figure out. So this is kind of the source of truth of what should be permitted. And so the first step is what we call a transformation. By the way, this list could have come from any number of data sources.
Rich Martin • 47:57
Could have come from a source of truth. The database could have come from GitLab or GitHub as a file. We’re just kind of statically doing it here just to make it simple. But the first thing we want to do is we want to run this transformation after that step. And what is a transformation in the world of Itential workflows? A transformation is super powerful. It allows us to take a set of data typically from an API call that’s returned from an API call.
Rich Martin • 48:23
But in this case, it’s also going to be data that’s coming from this particular allowed IPs list and the data that’s coming in from the call into Itential from the Kentik platform, all that JSON payload. And a transformation allows you just to take lots of data and extract that data out so that it can be used in subsequent tasks in a workflow. Because what you’ll find is you might be able to get data. Here’s a classic example. I can query Netbox for an IP address if it’s my IP address manager. I can query NetBox for the next IP so I can configure an interface, but it might give me the NetMask in a format that I can’t use for a CLI command. It might give me a CIDR format for the NetMask instead of the dotted quad for a Cisco CLI interface.
Rich Martin • 49:10
So a transformation allows you to visually extract that data and even manipulate that data so it can be used so that it can apply it to a configuration. So generally, that’s what these transformations do. So in this case, this transformation is going to extract the data we need to compare, to do an evaluation based off of the information we have. The first evaluation here is where we start building logic into a workflow. So we want to map what is a normal process for network teams when they’re responding to these in person. We want to map that into the workflow here. So it’s going to have some logic steps.
Rich Martin • 49:44
And in this case, this is going to do an evaluation. Is the IP address that Kentek just gave to me as the source IP that tried to access this website, is it in the accepted or allowed IP list? And if it’s not, it’s going to take this path down here. Here’s another transformation. What is this for? This is to generate an API message that we can send to MS Teams that says, hey, this IP address tried to access this web server, it wasn’t in the allowed list, fail. It’s not going to do anything to the network, but it’s going to send this to a child job, which is another workflow.
Rich Martin • 50:18
In fact, this is a modular workflow that can be used in any number of workflows, and you’re just going to pass a message to it that was created by this transformation. That message is essentially, this IP address tried to access the web server, it wasn’t in the allowed list, just wanted you to know it’s going to not make any network changes and end the automated workflow at that point, because there’s nothing else for it to really do. The other path that it could take is when the evaluation here is done, it says, the IP address is in the allowed list. Let’s now, before we make a change to the network, create a change request in ServiceNow, again, a fun process. We can automate those changes. When ServiceNow comes back, it’s going to create a net new service change request, and it’s going to have some net new information that we need. These query steps allow us to query that information.
Rich Martin • 51:10
This is an API call into ServiceNow. It’s going to tell us what the new change request ID and change system ID is. We need that because when we update that ticket with more documentation about what we’re about to do, that information is required for that ServiceNow update call. That’s why we extract that data there. Normally, what you would be doing in a manual process is going into ServiceNow, logging in through the interface, copy and pasting from different sources, maybe from something in Kentik, maybe from something in the network and pasting all this in, maybe something from AWS and pasting all of this in. These steps save you all of that time. There’s another transformation here that we’ll.
Rich Martin • 51:52
manipulate data so that we can send a notification in Teams. This notification particularly just says, hey, we just saw access from an IP address, and it meets the criteria of being allowed. then it’s going to decide what it should do next. In this case, it’s going to make another decision as it does this evaluation. The first thing it does is now it’s going to go into AWS and describe this particular security group. What we mean by describing it is that’s an API call into AWS. We give it the security group name because this is something that we may have to change, but we want to see what’s in there first.
Rich Martin • 52:28
Describing the security group says, hey, give us a list of all the existing rules that are already in there because if there’s a rule that exists for this particular IP address, we probably shouldn’t put it in there, it’s already in there. Maybe there’s another issue at play, and that’s exactly what this evaluation does. We describe the security group, we query for the group ID, and then we do an evaluation to determine, is this IP that Kentik just sent us, is it already in the security group? Because if it’s in the security group, then that’s clearly not the problem of why it can’t access this web server. Maybe there’s something else going on, and that’s what this path down here does. It creates now a ServiceNow incident. This is not a change request because we’re not making changes to the network.
Rich Martin • 53:11
This is an incident so that a human can take a look at this perhaps and say, figure out the problem. They have access to the web server. We validated that the security group is right. Kentik gave us the IP address that they’re coming from. Human being, take a look at this and maybe there’s another problem going on here that is not related to the security group. That’s what this does. It opens up the incident request, populates that ticket, and then sends a notification again in MS Teams. The entire team can be aware and have oversight in what’s going on.
Rich Martin • 53:41
Ideally, when they solve this problem and they figure out what it is, that could be another step in this automation as they iterate through and figure out all the possibilities of what could be wrong when these events happen, making this even a more intelligent automation. So the last bit of this is the kind of the final success path, if you will. Continuing to go here, this means that the IP address Kentik sent us matches the whitelist that we have, that we compared it to, it does not exist in the AWS security group, that must be our issue. So now we want to actually make a change to the infrastructure, in this case, update the security group, so that it can get access to it. And then once that’s done, we update the change request with this new information, documenting the changes that were made. And then we send a child job out to MS Teams to update the team again to this particular outcome, which is we made the change to the security group. Here’s the details of the change, you know, and validating everything that everything’s working.
Rich Martin • 54:50
So you’ll notice kind of MS Teams becomes the central hub of a lot of what’s going on. And it makes perfect sense, right? This is where we can now determine, you know, in a very, very succinct way, observing what’s happening in our automations, not just for a single person who might be running it. And in this case, nobody’s running it, right? But for an entire team to understand what’s happening. Remember, when we started, Leon, we clipped on the web server. I did a request, could get to it, but it was cached, but I tried again, it failed.
Rich Martin • 55:24
This would have triggered all that whole step of everything that was going on the Kintic platform that we went through, running the automation now that we’re seeing that we just went through on the itential side. Remember, all the updates that were going on to ServiceNow and in Teams, this is what’s so useful about all the data and the transformation and the ability to orchestrate across multiple systems is, now if I’m looking at Teams, not only can I create an automation that gives me details directly into all the different platforms. So I’ll just show you here, if I click this, we generated this in that automation, where I can go into the Kintic platform and look at the very specific rule and policy alert that was done on the Kintic platform. So that if I needed to investigate that further, I could, but in some cases I might want to take a look at the itential job itself. So now I can go into the platform and take a look at all the jobs as well and see how they ran. So this is an executed job. So this is a job that was actually run, not one that we were looking at in the automation studios where you build them, but this is a job that was run and you can kind of see the success path as you go through here.
Rich Martin • 56:31
It’s the same order of things that I showed you, but now you can see how it was executed. And if you double-click into any one of these tasks, you can see the data that was passed to it and the data that was output from every task. So now you can kind of troubleshoot these things. And by the way, this is all the data that the Kintic platform sent to us when it triggered that alert and did the API call into our system. So this is all that data that was given to us that we just operated on. And then finally, and maybe in some cases, the most important piece is the ServiceNow piece of it, being able to automate ticket creation, but ticket creation is not that big of a deal. The bigger deal is populating the ticket with all of the details of changes.
Rich Martin • 57:14
So everything from the payload that was sent to us to the same kind of information of what was actually done and links into everything on all these different systems so that any particular person in any particular group that has oversight into this can go into this and investigate it further if they need to. And if they don’t, the documentation is there and network teams didn’t have to do anything, any kind of swivel chairing between any of those systems to be able to do all of this. Then finally, the proof is in the pudding. I hit the reload and we can get access to it. Here’s the security group just for the last piece of it. So this is that one automated task in our workflow, added this into the security group as well, which is why we can’t access that server.
Leon Adato • 58:00
I have to emphasize throughout all this, everything that I starting it off and then Rich wrapping it up, the incredibly visual nature of this, and that’s not a dumbing down or a minimizing or simplifying. It goes back to the joke I made at the beginning of this webinar, which is that, you know who doesn’t want to be a developer? Network engineers, they don’t want to be a developer. They don’t want to have to deal with line after line after line of code and syntax and things like that. Not because they can’t, we certainly know that network engineering has its own incredible level of complexity, but if it doesn’t have to be, it shouldn’t be. And setting up the alert in Kentik is a very visual process. It’s selecting from lists.
Leon Adato • 58:48
And the process that you were going through of setting up the automation flow is again, a visual process. It is a way of being able to look and say, yes, this flow makes logical sense to my brain through my eyes. And if I need to add something or move something, I can see it. I’m not trying to parse out lines of syntactically rich code to do the same thing, which is what the way that other tools do it. You know, they require you to go in and look through, you know, SNMP lists or whatever it is, you know. So I just, it is delightful that when something doesn’t have to be code, it’s not.
Rich Martin • 59:23
Yeah, exactly, exactly. And I appreciate that. That is a big part of what we do is just trying to simplify it based off of, simplify automation based off of skill level and the abilities of our customers and their network teams. But with that, I think that brings us to the very, very end. So I just first of all want to say thank you to the audience. And second of all, say thank you to Leon as well for joining us here.
Leon Adato • 59:53
I really appreciate you having me here and letting me yammer on about stuff. It was really wonderful. I will say what I say at the end of any of my public talks. Not only thank you, but we are ready for your – we’ve been ready the whole time. But we are ready for your questions. You can find us on social media. You can talk to us right now.
Leon Adato • 01:00:11
But let us know what your thoughts are, and let us help – honestly, make your day better and make your weekend longer because the fewer alerts that you have to respond to physically means the less you get interrupted away from your time with your family and resting and recovering. So I like to say that monitoring observability and automation help make your weekend longer.
Rich Martin • 01:00:37
Yeah. Thank you very much again, Leon, and everybody out. We will catch you next time. Goodbye. Take care.