Automation at Scale: Tackle Configuration Chaos Where It Counts

Start Creating Config Consistency Where It Counts

When it comes to automation, configuration consistency is one of the biggest drivers of scale — and one of the hardest to nail down. Between legacy infrastructure, mergers and acquisitions, and years of drift, most teams are staring at a config landscape that’s anything but uniform.

Too often, the instinct is to fix everything at once. But treating consistency like an all-or-nothing initiative is a recipe for stall-outs.

If you’re struggling with stalled automation due to legacy sprawl or endless config debates, join Holly Holcomb and William Collins for their second installment of our deep-dive series on invisible automation roadblocks to discover how to break through with focused, high-impact changes.

???? What You’ll Learn:

Why all-or-nothing config projects create more drag than progress.
How to identify the config elements that matter most (and ignore the noise).
A pragmatic approach to iteration that avoids the paralysis of perfection.
Real-world examples of teams driving impact with targeted consistency.

Demo Notes

(So you can skip ahead, if you want.)

00:00 Introduction
01:03 What is Config Consistency?
05:27 Learning the Real Goal
11:43 Builder vs. Operator Persona
17:42 Starting Off Small
24:48 Staying Consistent With the Right Tools
28:23 Scalable Solutions
30:07 Wrap-Up
View Transcript

William Collins • 00:09

In case you missed it, check out the previous webinar that I did with Holly. We talked about basically overcoming some of those invisible roadblocks holding back automation at scale. But today, I just want to welcome you again. So my name is William, Director of Technical Evangelism at Itential. And with me again is Holly Holcomb, Program Director for Strategic Accounts. How are you? How are you doing, Holly?

Holly Holcomb • 00:38

Not too shabby. Excited to talk about config consistency today.

William Collins • 00:44

Yeah. So like it was one of the things we kind of like glided over it in the last webinar. Just a little bit of a touch along with some other things. But now we want to dig in, you know, dig in a little bit further. Yeah. Yeah.

Holly Holcomb • 00:56

Yeah, I know that when we talk to our customers, we talk about this topic a lot. So William, I’m really curious to hear your side. What do you think of when you hear people talk about config consistency or config compliance? What does that mean for you?

William Collins • 01:12

Yeah, great and very important question. You know, really defining it at its core. So, to me, configuration consistency is really all about a predictable network. And predictability is, you know, everything in our world at this point. You want your networks to be deterministic and you want to be able to predict their behavior. So, I think about it like this. ConfigDrift is like one of those big invisible blockers we talked about last time and it kind of creeps in a little bit quietly, right? So, if you think of like in just in network engineering, you know, changes can often get applied like very urgently. You have an outage, you go in, you make a change to some network device or multiple devices during an outage. And then, you know, because that service has to be restored ASAP. You can’t take your time and do this, that and the other. You all get on a big call and someone makes a change to fix it. And, you know, these ad hoc, or I call them emergency changes, I guess, they can lead to configuration inconsistencies, obviously, or ConfigDrift. So, you know, this like where I would say device settings, they’re going to diverge from the intended baseline or standardized policies that you may have defined. And it’s, you know, it’s cause and effect here. So like undocumented changes, or these like manual overrides that you have to make under pressure or these fixes, they don’t get propagated into the bigger picture of your architecture, they don’t get rolled back. So then you have this, you know, situation where, you know, you can introduce vulnerabilities, you can you have performance issues, or, you know, future outages, if not addressed. It’s like trying to conduct an orchestra where every musician is kind of reading from a different perspective. Set of sheet music and here’s the kicker. It’s Incredibly hard to scale when every device is kind of doing its own thing you end up spending more time You know managing these exceptions than actually moving Your your business forward, but you know that’s just my take on it and my take is you know from when I worked on the customer side, but you have this Unique opportunity to talk to a lot of big You know companies out there and kind of like see probably different themes across those companies, so what’s your take Holly? I?

Holly Holcomb • 03:51

Think that it is very much what you’ve just mentioned And I think in addition to that It’s a lot of diversity in terms of tooling and vendors out there that they have to manage So they’ve got a lot of complexity the configuration standards vary across vendors. They vary across Functional role of a device or infrastructure as a whole, and I think a lot of what we’re hearing from our customers is that There is a certain degree of sprawl When it comes to the number of tools that they have to use in order to manage configuration consistency across their infrastructure So I think I read a recent survey that said 70% of enterprises have at least three separate tools to manage device configuration I mean, that feels like an understatement from what I see with customers on a regular basis because they’ve got a tool that’s specific to cloud. They’ve got a tool that’s specific to their controller for SD-WAN. They’ve got another tool out there that helps manage their VM infrastructure, their Linux and Windows boxes. So I think that from our perspective, what we’re seeing with our customers is very much that config consistency, if it was only isolated to a single device type and a single role in the network, would be one task. And then when you take a bird’s eye view at the diversity in the network today and infrastructure today, it becomes even harder. So I think we see a lot of our customers that are struggling with this issue.

Holly Holcomb • 05:26

So we’ve talked a little bit about what config consistency is and the challenges that our customers are facing in terms of trying to manage it across all of their infrastructure. William, from your perspective, I know you just mentioned it a moment ago in terms of resiliency, but what is it that customers are really trying to achieve? Config consistency isn’t the goal. What are they trying to achieve?

William Collins • 05:54

That’s a really simple and really complicated question all at the same time. If I go on a rant here, throw something at me and make me stop talking because I’m kind of passionate about this area. But I think more than anything, we want our networks to be reliable. We want them to be secure. We want them to be efficient and optimized and also adaptable. And that’s what we’re here for. Thank you.

William Collins • 06:21

So, when it comes to maintaining correct state across a large distributed system like a network, we find that configuration consistency and compliance are very tightly coupled. So take the restaurant industry for example, I’m not sure why I keep bringing up restaurants in these webinars, maybe it’s because I’m always hungry, I don’t know, but it actually I think fits quite nicely. So, in a restaurant chain, let’s just call it, you know, Holly’s Taco Empire. Every location will probably have to follow some identical procedures. So think of these like golden configs. And this is to essentially comply with health codes from like bodies like the FDA and like local health departments. You know, maybe one of these procedures is, oh.

William Collins • 07:23

Maintaining refrigerators at precise temperatures to prevent bacterial growth. I mean, I don’t know if that’s a real thing. It must be a real thing if you don’t want bacteria on your stuff. Let’s say a few different restaurants were really, really busy. Holly’s really, really busy. Everybody’s busy. And maybe 5 out of those 100 restaurants forgot to do those temperature checks in favor of getting through a dinner rush much quicker. Well, let’s say those restaurants had, I don’t know, I’m out of my territory here, E. coli. E. coli outbreaks or something, and lots of customers got sick.

William Collins • 08:01

And then the investigation traced it back to like store-level failures where you had improper food holding temperatures that allowed bacteria to thrive. So it is things like this that directly erode compliance. And then you’re, you know, looking at like regulatory violations and penalties that directly impact the business as a whole, you know, financially, which isn’t good. And then, you know, reputational damage and so on. So to me, this is pretty comparable to what really you want out of your network and your network management. You know, so if you take a reactive approach to config consistency, then that means… you’re really taking a risk of being, you know, if you’re not proactive, you’re gonna be reactive with compliance.

William Collins • 08:54

So, you know, these misconfigurations could expose networks to threats and also directly impact your network’s availability, both to your employees and the customers you serve. Sorry if that was a little long-winded, but that’s, I guess, how I see it.

Holly Holcomb • 09:09

No, I totally agree. There’s two different aspects of the compliance that the food industry has to deal with. One is, like, you should do good, so if you don’t do good, then we’re going to publicly shame you by putting your score out for everybody to see it, right? But then there’s, like, the secondary piece of, like, people could get hurt because you’re feeding them food, right? The network could go down because you’re not compliant. So, like, I think it’s, like, too, there’s, like, the risk-averse piece of compliance and the public, like, defamation aspect of it, but there’s also… The service impact side of it that’s, you know, how much money are you going to lose because an application went down that’s either internal or external facing.

Holly Holcomb • 09:56

I think when we talk to our customers about the value that config consistency brings, it’s the point of compliance for a lot of them, especially those who are having to deal with PCI DSS compliance, just as an example out there. There’s also, I think you touched on it earlier when you’re using your orchestra example. The expectation that I have if I’m playing with you in a band is that you’re going to read the sheet music and you’re going to interpret the music as it’s written on the sheet. If both of us are interpreting things in our own way, it’s very hard to scale and to build an orchestra, a whole band around it. So agility is also a piece that I feel like we see with a lot of our customers and the assumptions that you have on what exists on the network to make sure that your deployments are successful. So, I totally agree. I think that there’s a lot to be said for resiliency of your network, security of your network, compliance status of your network, and then agility and being able to build on top of it.

William Collins • 11:02

Yeah, and they’re all interwoven, so you have to really get the foundations right, or else, you know, hey, if everybody’s playing different sheet music at the same time, it’s still music. It’s still music, but it’s not the type of music that you want to listen to. And you know, that’s the same for network, you know, you’re going to get a bad reputation wherever you work if your network’s always going down, it’s always BGP, it’s always DNS, it’s always this and it’s that, and this snowflake configuration over here, which broke, you know, X, Y, and Z, and then everybody has a bad experience because everybody, you know, uses the network directly or indirectly, everything goes on the network.

Holly Holcomb • 11:42

When we talk about defining a standard that you’re actually going to manage and audit across the network, that’s usually, you know, it’s usually an effort that a lot of people take part in, in one way or another. What’s your background? Did you, do you see yourself as having more of a deployment mindset when you’re defining what’s going to be deployed on the network, more of a greenfield kind of perspective, or do you see most of your bread and butter background and more of the day-to-op side of things?

William Collins • 12:16

That’s a good question. And those were two very different jobs for me. So I did both of them for quite a while. Earlier on in my career, I focused more on ops. And then as I, you know, the more I learned, and the more I’ve got, you know, my hand in many cookie jars, I got into the engineering and in the architecture side. And these are just two very different areas. And I truly believe that the ability for your ops team to actually operate, troubleshoot, and maintain the health of the network is dependent on engineering and architecture doing the best job to put together the best design, knowing like what, you know, how big is your ops team? What tools do they have to manage the network? What products you choose based on how you can support the, you know, the network that you’re, you’re building. So these are, in my experience, anyway, these were two very different jobs. And earlier on in my career, like some of the behavior that I would see sometimes was kind of like going back to like, why DevOps was invented, hey, you can’t just put together this design and throw it over the fence at us and, you know, cross your fingers and say goodbye. You know, that’s sort of changed for the better. And that was something that was really important to me, especially when I got into the engineering side was saying, okay, we’re building this new network, and we really want to have a solid operational handover to that, that network operation center, or whoever it was, it was actually going to be operating things so that they can support it. And And not only does that, like, if you do this the right way, you’re doing your job the best that you can, but you’re also not going to be woken up as much at 2am, 3am, and have all that type of stuff happen, which I’ve been there too. And that’s like, no fun, throws the whole week into chaos. Absolutely. Yeah. So I kind of like, in a sense, it’s like the engineers and architects are almost driving greenfield deployments, and then your ops teams obviously are going to be inundated with all the various brownfields that you might have in circulation.

Holly Holcomb • 14:35

That’s a really great way to put it. I know that in a lot of the conversations we have we are usually talking to either those who are managing Greenfield deployments or those who are managing day two operations and in some rare situations we have them both in the room but do you feel like it gave you a lot of empathy having done the operations role to kind of fine-tune the way that you consider deployment for Greenfield?

William Collins • 15:05

Yes, 100%. Because if you don’t feel the pain, you don’t think about these things of, okay, you know, somebody else, like, okay, if I’m putting together a network, do I want to go back to my CCIE handbook and really, like, over-engineer this network to, you know, to space? Or do I want to build something that meets requirements that we can actually support? Those are two different… It’s a fork in the road, so you can choose either way. You control your own destiny.

Holly Holcomb • 15:35

It’s almost like defining those standards when you do your deployment is… It’s a technical question or discussion, but it’s also cultural as well. Understanding your organization and the people well enough to define what the right fit is going to be.

William Collins • 15:57

That’s a really good point. So, thinking of it in terms of… Yeah, just thinking of it in turn, like let’s say Greenfield deployments, it’s it’s relatively I don’t want to say it’s easy, but everything’s new. You can use whatever the newly defined standards are and whatever tooling you have in place at that time. And guess what? You’re probably not going to break anything because production traffic isn’t rolling over that network yet. So you can’t, you know, you it is what it is, but you can’t just forget about brownfield, right? Like you can’t just say that stuff’s over there.

William Collins • 16:34

We’re just going to leave it. And, you know, those ops teams, they’ll figure it out. You know, brownfield’s a completely different beast. You’ve got years of years and years and years of legacy configurations with, you know, institutional knowledge just sprinkled ever so slightly everywhere. And you know, today’s networks, I don’t think I’ve ever worked on a network that didn’t span multi-vendor hardware in the last 10 years, you know, very hybrid setups and legacy systems accumulated through organic growth. You grow and the business grows and you have more stuff. And then there’s M&A, too.

William Collins • 17:10

So I worked for a lot of companies that were M&A heavy. And that was always. I mean that was like for some companies it was such a big thing with so much work that we had different teams that handled M&A only, basically. And this in turn, like if you’re zooming out to try and see the forest from the trees and you have all those moving pieces and you don’t take that, you know, the standard stuff very seriously, it’s going to lead to snowflake devices with unique configurations that are very good at resisting uniformity. One thing too that I’ve seen repeated like over and over and over and over again is, you know, before we automate anything or before we want to get into this standard stuff, let’s figure out every outcome that could possibly happen between now and where we want to get to. Let’s do it all at once. Let’s clean up all our data before we ever get started. And by all, I mean all the brown fields, all the new green fields that we’re using to even fund the automation program in the first place.

Holly Holcomb • 18:13

How do you create a standard out of that? I’m legitimately curious. You know, it’s a very challenging thing, you know Some might refer to this approach as boiling the ocean and you know, what’s easier than blowing the ocean is Boiling water and your cooking pot or tea kettle start small. That’s right Oh, that’s such a good point. You posted something on social media recently. That was very much I feel like in line with what we see with our customers It’s almost always a topic that people want to tackle when they first begin their Orchestration journey, they know that config consistency has so many great outcomes. Like we just discussed a little while ago and Then when we check back in with them a few years down the line to see how how that effort is going how that body Of work is going It seems like many of them stall at defining the standard because they’re trying to build every edge case into the standard Instead of taking away chunks of the config that you want to say These are you know consistent across a large swath of our infrastructure Trying to solve for all of the edge cases And solve for any challenges that you might find in a really diverse and complex Infrastructure trying to solve for all of those before you even make any progress. I think is something that we see With almost every customer we talked to defining a standard can be really challenging Because you want to solve all of the problems before you actually start building something that’s going to generate value Yeah. Yeah, you’ve got to reconcile, you know, so you have

William Collins • 19:56

You know, when you go on, you embark on these big programs, like, to the business, you have to show some sort of value, like short-term value, you have to show that you’re doing something. Like, if you’re sitting on your thumbs for a whole year, and you’ve got funding to buy tools and to maybe get some professional services, or however it is that you’re going to go about, you know, trying to solve the problem for that business, and you don’t show anything and you’re just in planning, analysis, paralysis for all of time, that’s not good for the business. It’s not good for you or your team. You’ve got to show that you’re doing something, but you don’t want to also sacrifice, like, long-term value as well. You don’t want to take shortcuts. I would say that, like, if you really want to get… you know, take all of this stuff like very seriously, that there’s no huge shortcuts. You know, for some of these things you have to put in some of the time and you have to, you know, you don’t want to cut too many corners, I would say.

Holly Holcomb • 20:58

I think another thing that we see is that, I love the part that you just mentioned about no shortcuts, In some cases, going into a brownfield network and trying to build the bridge with the ops team that’s going to help enforce that standard that you build, usually, from what I’ve seen at least, there’s a team that owns tooling around managing config consistency throughout the infrastructure at large. And then there’s a team that has kind of the day-to-day operational responsibility of managing, updating, config on the infrastructure, things like that. And because the team with the tooling is oftentimes the one that defines some of the deployment architecture as well, for them, this is a simple problem to solve, which is get the standard, and then we’ll create a level, an auditing mechanism that goes out and helps us figure out how far away from our goal are we. And when you talk to the ops team, who see all of the nuance across your infrastructure, they see all of the different ways that maybe it doesn’t align with the standard, but it’s there for a reason because this person made this change in order to make sure that this outage stopped. You know, two months ago, three months ago, a year ago. I think that one thing that we’ve noticed is that defining a standard, kind of like what we talked about earlier, culturally, is also about building a bridge between teams so that you’re all on the same page with what the standard should be.

Holly Holcomb • 22:32

And being able to have that feedback loop of, if this thing is consistently out of line in terms of our standard, how do we start moving towards a place of baking it into the standard? How do we start formalizing that into something that we can then manage instead of consistently excluding it from our audits or consistently saying it’s, you know, something that needs to be, it needs to be there, but it needs to be flagged every month or so when we do our checks.

William Collins • 23:00

I love that. I love that. I was actually just talking to a customer not too long ago that had, that’s exactly their frame of thinking when they were building some different things. There’s one example they gave that was, I thought, rather valuable is they do a lot of like firewall rule changes in a lot of different areas, on-prem, cloud, everywhere. And for some of their firewalls, kind of, I think it was in Colos. you know, as these rules get added and added and added, they weren’t getting removed. And so they basically set up this process and, you know, they put this into the standard to say, okay, when you submit this form, it adds the firewall rule.

William Collins • 23:44

We have all the data and the metadata surrounding that firewall rule. If that firewall rule doesn’t get any hits, no matter what, they had a time frame threshold, it automatically gets pulled back out. Because what they were finding is they had all these rules being added and added and added over time, and then they had just, you know, firewall rules sprawl everywhere. And how do you audit that? It’s very hard to audit that. And moreover, if something isn’t being used for X amount of months, then why is it there? So they had this auto-remediation process, which I thought was really cool. And, you know, one of the questions I’d ask them is, you know, have you ever caused an outage by removing rules? Or maybe it started taking traffic the day after you decided to remove the rule or something. And they’re like, no, we never had that happen. I thought, you know, as long as you understand the threshold, you understand the rules and the constraints and the process, and it’s in the standard, that’s very smart, very tactical. And I loved it.

Holly Holcomb • 24:48

COLLEEN O’BRIEN So, William, when with all of the diversity in the network and all of the different ways that you could come and tackle this problem, like how how do you make progress? Like how do you make the right decisions on what to do first? How do you not boil the ocean? WILLIAM KNOEDELSEDER

William Collins • 25:06

So, one could come into a network device and say, hey, I want to model the whole config and get all my data in a row before I do anything. And if you do that across everything, then maybe you just won’t get anything done. But if you take small snippets that are very important, like one example, say that you have a routine change where you have to go and rotate SNMP strings across your environment. Maybe it’s a, you know, some sort of security compliance, just something you have to do. Like every, you know, this many months we rotate SNMP strings. Well, start there. If that’s one of those changes that you’re doing very frequently, which is also very important, it’s a change you don’t want to mess up because, hey, if SNMP gets messed up on the device, guess what?

William Collins • 26:04

If you’re plugged into SolarWinds or Greylog or whatever it is that’s monitoring the device, you’re going to have a problem if it’s using SNMP strings. You’re not going to have visibility. You’re not going to have eyes on that device. So that’s a problem from operations because they’re not going to see, okay, neighbor changes or, you know, something goes down or, you know, the data that’s coming in via SNMP to that monitoring tool. You’re not going to have it. So keeping that consistent, and that’s not something that’s super hard to template out.

Holly Holcomb • 26:37

You know make changes So I think it’s a good starting point like taking little snippets like that is something that’s actually real And it’s a problem that you’re trying to solve And it’s something you do pretty routinely whereas I mean you could go after the parts in the config that you that don’t change a ton or that aren’t a big problem and You’re not going to be able to show as much value as the you know I think the SNMP example is probably a really good one Mm-hmm. I think what we’ve been seeing with customers is that it’s really tempting to want to do The big project first because you know, it’s going to give you a ton of ROI but it’s so important to sprinkle in some quick wins and small iterations so that It doesn’t take Eight months to a year just to see some ROI on all the effort that you’ve put into the Config consistency goal. So I think you’re totally right small config snippets identify the things that are going to make the biggest impact that don’t require months of Investigation in order to figure out like how to implement it in the correct way. I totally agree with you

William Collins • 27:55

Yeah, and yeah, I love this. It’s like pick your battles, you know, start with those config elements that cause you the most pain and, you know, maybe they represent the biggest amount of risk for whatever reason. Really important. And then second, make it easy for teams to stay consistent by giving them the right tools and the right guardrails and the wherewithal to actually do this. Make it easy to do the right thing. And I think as we sort of go and we close this up, do you have any, you know, given your experience and exposure to seeing a lot of these firsthand? You know, do you have any tips for the folks out there that can, you know, what can they think about as they try to kickstart their config consistency journey?

William Collins • 28:43

You know, where should they start, Holly? Absolutely.

Holly Holcomb • 28:46

Well, I’ll put in a quick plug. We’ve got a white paper on our website that is all around a methodology for achieving config consistency. And it talks about some of the things we’ve mentioned in this webinar. It’s talked about focusing on config snippets. And it also talks a little bit about how you overcome the problems of defining a standard. There’s some really interesting content in there about AI as well. So I definitely would say a good first step just to understand resources that are out there would be to check out that white paper.

Holly Holcomb • 29:24

And then, like you mentioned before, isolate out the things that are going to give you immediate value that don’t involve boiling the ocean. And sprinkle in those quick wins with the strategic ones that will take a little bit more time and finesse in order to get to the goal. So I think that those are two things to keep in mind when it comes to where to get started with config consistency.

William Collins • 29:51

I love that. We’ll link that methodology guide from our website. And you know, remember a predictable network is a stable, secure one. You know, that’s the foundation for automation that actually works. So thanks everyone for tuning in. If you want to discuss this further, feel free to reach out to Holly or myself on LinkedIn. Or if you want to learn more about itential just in general and the technology that we’ve built to tackle this problem, among others, you know, at enterprise scale, make sure to visit itential.com and check it out. And thanks everybody for tuning in. Thanks guys.

Filter

Sort By

Itential Platform

Solutions

Resources

Partners

About Us

Start Creating Config Consistency Where It Counts

???? What You’ll Learn:

Demo Notes

View Transcript

Watch More Itential Demos

Stay in the loop with Itential.

Filter

Sort By

Start Creating Config Consistency Where It Counts

???? What You’ll Learn:

Demo Notes

View Transcript

Watch More Itential Demos

Related Content

On-Demand Webinar

Automation at Scale: Overcome The Invisible Roadblocks Holding You Back

Blog

Audit Anxiety to Automated Assurance: Solving Real-World Configuration Challenges with Itential

Demo

Full-Stack Compliance as a Service: Empower App Teams, Without Losing Control