Blogs

The Network Refresh Problem That Scripts Could Never Solve

Principal Architect – AI Solutions & Strategy

Key Points

- Scripts automate homogeneous environments. A network refresh is the moment of maximum heterogeneity, which is exactly why it has never been a scripted automation problem.
- FlowAgents reason about what they find rather than pattern-match against what they expected, handling the one-off policies and legacy configurations that break scripts.
- Three use cases (Arista EOS to Cisco IOS-XR, Juniper JunOS to Nokia SROS, and MPLS to Segment Routing) show how specialized FlowAgents and a persistent migration instance turn refresh from a crisis into an operational capability.
- The migration report becomes a real deliverable: structured, queryable proof of every decision made, valid for CAB close-out, future engineers, and compliance audits.

Every network has a lifecycle. Hardware reaches end-of-life. Vendor support contracts expire. Platform capabilities that were state-of-the-art five years ago become constraints against modern operational requirements: Segment Routing, model-driven telemetry, closed-loop remediation. So begins the network refresh. The planned replacement of infrastructure, often across vendors, always across risk.

Network refresh is one of the most capital-intensive investments an operator makes. It is also one of the most feared. Not because the engineering is impossible. It isn’t. Because the tooling that exists to support it has never matched the nature of the problem.

Scripts automate homogeneous environments. A network refresh is, by definition, the moment of maximum heterogeneity.

Two vendors, two operating systems, two CLI models, two sets of feature semantics, coexisting in the same network, carrying production traffic, while the migration team works to collapse that duality into a single, coherent new state.

That is not a scripted automation problem. That is an intelligent automation problem.

AI agents (goal-based, reasoning, adaptive) are uniquely suited to it. FlowAgents on the Itential Platform are how we put that pattern into production. This post is about why it works.

What a Network Refresh Actually Involves

A brownfield migration does not start with a clean slate. It starts with a production network that has been accumulating configuration for years. Route policies written by engineers who have since left. ACLs that nobody is sure are still needed. Workarounds for vendor bugs patched three releases ago. Customer-specific configurations that are nowhere documented outside the device itself.

Before a single line of new configuration is written, someone has to understand what is actually running. In a realistic production network, this discovery work takes weeks when done manually.

Then comes translation. The way Arista EOS expresses a BGP route policy is fundamentally different from the way Cisco IOS-XR expresses the same policy. The way Juniper JunOS structures a VRF differs from the way Nokia SROS handles the same concept. The feature semantics may be identical. The way it is expressed is not. Many features have no direct equivalent at all, requiring redesign rather than translation.

Then comes the Method of Procedure, the formal change document the Change Advisory Board approves before any maintenance window opens. Then execution: a maintenance window, a cutover, real-time monitoring of BGP convergence and traffic restoration, and constant vigilance for anything that does not behave as planned.

Each of these phases, done manually across a fleet of devices, represents weeks to months of senior engineering time. A missed dependency in the MOP, a translation error in a critical route policy, an ACL that worked on EOS but behaves differently on IOS-XR. Any of these can cause a service impact during the cutover window.

This is why network refresh programs have historically been multi-year efforts. Not because the work is impossible, but because the preparation is enormous and the margin for error is small.

Why FlowAgents, Not Scripts

The instinct when facing a large, complex engineering task is to automate it. For many parts of network operations, scripted automation is exactly right.

But scripts have a fundamental constraint. They handle the cases you anticipated when you wrote them. A production network refresh is defined by the cases you did not anticipate: the one-off customer policy with a vendor-specific match condition, the BGP peer configured differently from every other peer for reasons nobody remembers, the QoS policy customized three times that no longer matches the template it came from. Scripts fail silently or catastrophically on these cases. An engineer has to be involved anyway, which means the script did not actually reduce the work. It just changed where the work happened.

FlowAgents handle this differently because they reason about what they find rather than pattern-match against what they expected to find. A discovery FlowAgent on a device with an unusual configuration does not fail. It reads it, understands it in context, and represents it accurately. A translation FlowAgent encountering a feature with no direct equivalent on the target platform does not produce a broken config. It generates the closest functional equivalent, annotates the decision, and flags it for human review with a clear explanation of why.

The agent handles the unexpected case gracefully. And it documents what it did, so the engineer reviewing the output is not discovering the problem during the cutover window at midnight.

The Architecture: Specialized FlowAgents, Shared State

A refresh pipeline decomposes into FlowAgents with defined roles, each contributing to a shared migration instance that persists across the entire process. Pre-migration: discovery, analysis, translation. Execution: validation, cutover, steady-state verification. Each FlowAgent is stateless. Each one reads from and writes to a single migration instance held in Lifecycle Manager. Device access flows through Itential Gateway. External tools and data sources flow through FlowMCP Gateway.

The migration instance is the thread that runs through every stage. Not a log. The authoritative record of this device’s migration: what was discovered, what the analysis concluded, what translation decisions were made, what was validated before cutover, what was verified after. Every subsequent FlowAgent reads from it. Every FlowAgent writes back to it.

Use Case 1: Arista EOS to Cisco IOS-XR

The most common multi-vendor refresh pattern in service provider networks is the transition between major routing platforms. A provider that built their core on Arista 7500R series running EOS is looking at a refresh to Cisco NCS 5500 running IOS-XR. End-of-life timelines, Segment Routing at scale, or vendor consolidation strategy drive the work. Either way: production L3VPN customers, active iBGP mesh, EVPN services, custom QoS policies built up over years.

Discovery: Reading Intent, Not Just Config

The discovery FlowAgent connects to each Arista device through Itential Gateway and does not simply pull the running configuration. It extracts intent (what the device is actually doing) in a vendor-neutral structured form. BGP AS and peer role counts. Active VRFs, unique customers, PE-CE BGP sessions. EVPN VTEP, VNIs, route-targets. QoS service policies. ACL counts with estimated IOS-XR equivalent line count. Nothing in this output is raw CLI text. The FlowAgent understood what it was reading and represented it in a form every downstream agent can consume.

Analysis: The Technical Assessment

The analysis FlowAgent produces the pre-migration assessment: what the CAB needs to approve the change, what the engineers need to plan the MOP, what the NOC needs to monitor the cutover.

The output is segmented by translation category. Roughly 73% automated, no human review required (BGP, prefix-sets, community-sets, OSPF, static routes, interfaces, SNMP, NTP, AAA). Roughly 22% translated with engineer review (route-maps to route-policies, BGP peer-groups to neighbor-groups, QoS policy-maps, MLAG to Bundle-Ether). Roughly 4% requires redesign (EVPN type-5 redistribution, ACLs with EOS-specific match conditions, BFD timer profiles). And a feature-gap section calling out platform-specific items like EOS event-handler scripts.

The risk assessment names the primary risk factor, recommends the off-peak window, defines the rollback window, and sequences the migration by dependency. This report is what gets submitted to the CAB. It is produced in hours, not weeks. And it is accurate, because a FlowAgent generated it after reading every line of the actual running configuration.

Translation: Configuration with Commentary

The translation FlowAgent produces IOS-XR configuration from the Arista baseline, organized by feature class, with embedded annotation for every non-trivial decision. A route-map becomes a route-policy. The sequential match/set model is restructured into the IOS-XR conditional model. Community format changes from plain integers to AS:value notation. The translation includes the source EOS comment, the translation note explaining the restructuring, and the resulting IOS-XR block.

Items that require human review are not skipped. They are scaffolded. The EVPN type-5 redistribution case ships with source behavior, target behavior, a starting-point IOS-XR block to validate, and a clear flag: EVPN team review required before cutover. The engineer reviewing this output is not discovering the problem. They are making a decision with the context already gathered.

Cutover & Verification

The cutover FlowAgent executes the MOP in sequence, monitoring convergence at each stage and writing status to the migration instance in real time. RIB snapshot captured, interfaces and VRFs applied, prefix-sets and route-policies applied, OSPF converged, BGP neighbors established.

One issue showed up during a representative cutover: a single eBGP peer failed to establish. The FlowAgent investigated in context (peer IP unreachable from the new device, OSPF route present but BGP next-hop not yet redistributed), added a temporary static route, watched the BGP session establish, then removed the static route after convergence. The decision and reasoning are recorded in the migration instance. An engineer reviewing the change log six months later can see exactly what happened.

The verification FlowAgent does not compare configurations. It compares behavior against the baseline captured before the cutover. BGP sessions established. OSPF adjacencies up. RIB and EVPN counts within tolerance. Traffic throughput approaching baseline. All L3VPN customer services verified. QoS spot-check passes. The migration instance now holds the complete record: pre-migration discovery, analysis, translation decisions, cutover timeline, verified post-migration state. Proof of record for the CAB close-out.

Use Case 2: Juniper JunOS to Nokia SROS

The Arista-to-Cisco example shows the challenge of translating operational model differences. The Juniper-to-Nokia scenario shows a different class of problem. Migrating deeply entrenched MPLS services where the forwarding semantics, not just the configuration syntax, differ between platforms.

Both platforms support RSVP-TE, LDP, L3VPN, and carrier-grade services. The way they structure traffic engineering tunnels, the way they handle LDP-over-RSVP, the way they express QoS through hierarchical scheduling, these differ in ways that require genuine understanding to translate correctly. Juniper’s configuration model is hierarchical. Nokia SROS uses a flat CLI model that is conceptually different in how it references these relationships. A direct line-by-line translation produces configuration that is syntactically invalid.

The analysis FlowAgent identifies the real migration risk. RSVP-TE tunnel mesh: JunOS auto-bandwidth and adaptive CSPF differ from SROS RSVP-TE behavior, so the recommendation is to re-engineer the traffic engineering strategy against the current traffic matrix rather than direct translation. Hierarchical QoS does not map one-to-one and requires policy-level redesign. LDP-over-RSVP and BGP route reflector configuration carry medium risk with semantic equivalents that need careful verification. The dependency flag is clear: migrate P routers before PE routers, and traffic engineering team sign-off is required before any TE tunnel reconfiguration.

The analysis does not just say high risk. It explains why, which aspect of the migration creates the risk, and what the recommended approach is. The traffic engineering team that reads this report gets directly actionable guidance.

Use Case 3: MPLS to Segment Routing

Not every network refresh is a vendor swap. Technology migration within a platform is equally complex and equally resistant to scripted automation.

A service provider moving their core from LDP-based MPLS to Segment Routing on Cisco IOS-XR faces a different challenge. The vendor and hardware are unchanged. But the forwarding plane changes fundamentally, and every service that depends on label distribution has to migrate without dropping traffic.

The SR migration challenge is sequencing and service impact analysis. A FlowAgent that understands the topology can determine the correct order of IGP-SR enablement, identify which services have LDP dependencies that must be migrated to SR before LDP is withdrawn, and generate the migration MOP in dependency order.

The phased plan: enable IS-IS SR alongside active LDP (no disruption); shift traffic to SR paths while LDP labels remain as backup; withdraw LDP only after per-service validation; then enable TI-LFA for sub-50ms convergence. In a representative network, the FlowAgent identifies 47 L3VPN services with explicit LDP tunnel dependencies and 6 with non-standard tunnel preferences that require human review before LDP withdrawal proceeds. Exactly the cases a blanket enable-SR-everywhere script would miss, and exactly the cases that cause service impacts during a live migration.

The Migration Report: An Artifact That Stands Alone

One of the most important outputs of a FlowAgent-driven network refresh is the one easiest to overlook: the migration report.

After a traditional refresh, the post-migration artifact is a change ticket marked complete and a job log showing automation ran. If something goes wrong three months later, reconstructing what changed requires hours of investigation.

A FlowAgent-driven migration produces a structured report that is a genuine deliverable. Scope. Window and actual duration. CAB reference. Pre-migration analysis breakdown. Cutover outcome with session counts and service verification. Decisions recorded, each linked to the migration instance section that holds the supporting data.

Every section links to data in the migration instance. Every decision has a recorded rationale. The engineer who worked the cutover does not write this report. It is assembled from structured records FlowAgents created throughout the process. Proof of controlled change for the CAB close-out. Reference for future engineers. Timestamped, structured record for compliance. Governed by default.

Where FlowAgents Genuinely Differentiate

It is worth being specific about which parts of this pipeline represent genuine agent capability versus what could be scripted.

Intent extraction. A script parses BGP blocks. A FlowAgent understands what the BGP configuration is doing: how policies chain, what communities signal, which peers are in which operational role. That understanding is what makes translation possible at all.
Migration complexity analysis. Assessing which features translate cleanly, which are risky, what the dependency ordering should be, what the recommended maintenance window looks like. This is technical judgment. It has no scripted equivalent.
Translation with annotation. Generating target-platform configuration, explaining each decision, and scaffolding the cases that need human review. The annotation is as important as the configuration.
Adaptive execution. When something unexpected happens during a cutover, the cutover FlowAgent investigates in context, classifies the issue, and decides how to proceed. A script fails. An agent troubleshoots.
Report synthesis. Assembling a coherent, human-readable migration report from data collected across multiple FlowAgents, multiple devices, and multiple stages. Done manually, this takes days and often never gets done at all.

The Business Case

Network refresh programs are capital investments. The risk profile of those investments determines whether they proceed on schedule, get deferred, or get cancelled.

Risk reduction changes the economics. When a migration has a complete pre-migration analysis, engineer-reviewed translations, validated configuration before the maintenance window opens, and a tested rollback procedure, the failure modes are known and bounded. MTTR if something goes wrong is measured in minutes, not hours. That risk profile is what allows organizations to run more refresh cycles, more frequently, without requiring heroic effort from engineering teams.

Preparation time is the real bottleneck. A migration cutover that takes three hours is preceded by weeks of discovery, analysis, and translation work. FlowAgent-driven pipelines compress that preparation from weeks to hours, which means the engineering team’s time is spent on decisions, not data collection. Organizations that have been deferring refresh cycles because the preparation cost was prohibitive can treat migration as a routine operational capability.

Documentation is not optional. For carriers with regulatory obligations, enterprises with audit requirements, and government networks with formal change management mandates, the post-migration record matters as much as the migration itself. When that record is a natural output of the migration process rather than additional work, compliance becomes a byproduct of good automation rather than a separate project.

Vendor flexibility is a strategic capability. An organization that can migrate between vendors efficiently has genuine leverage in hardware procurement, support contract negotiations, and technology strategy decisions. The lock-in that comes from vendor stickiness is partly technical, but it is also the operational lock-in that comes from migration being too expensive and risky to contemplate. FlowAgent-driven refresh changes that calculation.

What This Isn’t

FlowAgent-driven network refresh is not a replacement for network architects. The decisions about what to migrate, in what order, with what target design, these require expertise that FlowAgents augment, not replace. The requires-redesign category in the analysis report marks the boundary of agent autonomy. Those decisions belong to engineers.

This is also not a path to zero-touch migration. The value is not that engineers are removed from the process. It is that engineers are removed from the mechanical parts of the process and engaged on the parts that actually require their expertise. Discovery, translation, and validation are mechanical. Design decisions and risk acceptance are not.

And it is not magic on poorly documented networks. Discovery FlowAgents read what is actually on the device. Years of accumulated configuration, inconsistencies, and technical debt will be faithfully represented in the analysis. FlowAgents make complexity visible and manageable. They do not make it disappear.

A Different Relationship with Infrastructure Change

Network refresh has historically been treated as a crisis to be managed. The preparation is enormous. The risk is high. The window is short. The stakes are real. So refresh cycles get deferred, hardware runs past end-of-life, and the technical debt accumulates until the crisis forces the issue.

FlowAgent-driven migration changes those economics. When discovery takes hours instead of weeks, when analysis produces a CAB-ready report automatically, when translation is largely automated with clear annotation of the cases that need review, the refresh cycle is no longer a crisis. It is an operational capability. A vendor transition that once required a twelve-month program can be conducted as a rolling series of controlled, documented migrations. Each one faster than the last.

That is not just a better migration tool. That is a different posture toward infrastructure change, and another example of the practical benefits of agentic infrastructure operations.

Ankit Bhansali

Ankit Bhansali is a Principal Architect – AI Solutions & Strategy at Itential. Drawing on a strong research background in software and networking, he designs innovative solutions to address the industry’s most complex challenges. His strategic approach empowers businesses to achieve transformative growth through robust automation and end to end orchestration.

Keep Learning