Data-driven operations
Seeing the big picture, solving the right problem and finding the answer faster.
What is data-driven operations?
Networks are becoming more complex. Software-defined networks, network functions virtualization, network slicing combined with customer managed services through portals to allow the addition, modification and deletion of services means it’s becoming harder and harder for operations teams to stay on top of things.
More than that, customers are becoming more sophisticated and demanding with regards to the services they purchase—especially when it comes to quality. Virtualization makes it easier for them to change service providers, so quality absolutely matters.
Detecting and fixing issues before the customer even notices—this is no longer just a wish.
By leveraging big data and analytics, operations teams can now proactively see potential service impacting issues as they arise, develop prioritized plans of attack to address them, execute corrective actions and verify fixes without any customer complaints.
Are you ready to take your customers’ quality of experience to the next level?
EXFO has the expertise and solutions you need to become a data-driven, customer-focused operations team.
Challenges
How will data-driven operations impact carriers?
Modern networks need to rely heavily on automation due to scale and complexity concerns. Automation, in turn relies heavily on analytics that can sort through a sea of data to find the hidden trends and issues that are causing unwanted behavior in the network: congestion, dropped calls, even equipment failures. And in a virtual network, some of these issues are only detectable by correlating many small changes in seemingly unrelated performance indicators. Having the right data and understanding the full impact will be critical to effective operations in the transformed network.
Good analytics needs good data
The quality of any assessment is only as good as the information used to make it. For data-driven operations, this means having a complete set of real-time, precise key performance indicators (KPIs) for both the network and the services it carries. In fact, in an SDN/NFV network where there may be no direct correlation between the physical topology and service topology, it’s important to have KPIs for every service. Correlating changes in service KPIs, even before they indicate an issue, can lead to the discovery of hidden but more widespread problems, impacting many services.
Ideally, KPI generation should be part of the service definition itself, being measured at the service endpoints and in a consistent manner for all services. This end-to-end view provides the closest measurement to what the customer is actually experiencing.
Finding a needle in a haystack
The good news is you’re generating KPIs for every service in your network—100% service visibility. The bad news is you’re now being swamped with KPIs—200,000 services generating 5 real-time KPIs per second means you’re generating 1.44 billion KPIs per day! With this volume of data, detecting major service issues could be challenging for a human. But what about trying to find services that are just starting to trend towards problems or looking at trends in utilization that could trigger network expansion?
This is where big-data analytics shines. Making sense of a sea of data means looking for subtle changes in KPIs that may otherwise go undetected. And being able to correlate changes in KPIs across many, possibly unrelated, services can lead to network insights that would have otherwise been hidden. Proactively detecting and managing changes in both network and services is the key to data-driven operations.
It’s never just one thing
The challenge for any operations team is deciding which problem to fix first. Seldom do problems develop in isolation, nor is it likely the root cause is located where the problem was first noticed. Take latency for example. Noticing a trend toward higher latency at an endpoint is often a sign that the network is becoming congested somewhere—quite possibly an issue with over-utilization of an Ethernet port or a protection switch elsewhere. Without the ability to scan millions of KPIs, detect the ones that are trending in the wrong direction and then correlating them to find a common potential source, your teams may spend days chasing the wrong problem.
Seeing the ‘big picture’ and being able to correlate and prioritize all the known issues is what data-driven operations is all about. Fixing the right problem first can often mean solving many problems at the same time.
Keeping track of the ever-changing network
One of the key benefits of an SDN/NFV-based network is the ability to continuously optimize the infrastructure and service topology to address changes. If a virtual router is reaching capacity, create a new virtual router to share the load. If there’s a failure or high congestion on a particular link, re-route traffic to avoid the problem link. And since SDN/NFV networks rely on orchestration to make these changes, there may not even be a human involved.
Keeping track of how the network and services change, in real time, is essential for both service provisioning and fault isolation. Knowing there’s adequate bandwidth available to provision a new service, or knowing which virtual and physical elements a service runs through, are critical for effective operations.
Real-time topology mapping must become a fundamental building block of any solution, both for virtual and hybrid virtual/ physical networks. Without it, operations teams can waste significant effort trying to piece together what they need—and even then, because of automated orchestration, things may change without notice.
Solutions
How will carriers benefit from data-driven operations?
Networks are in transition. SDN and NFV will completely change the way networks are planned, built and managed. At the same time, carriers continue to experience attrition in their pool of skilled, experienced personnel and for some carriers, many of these positions may not be backfilled. Data-driven operations allow carriers to change their operation paradigm, enabling their shrinking, younger workforce to ‘do more with less’. Having the right data combined with actionable, policy-driven insight is critical to successfully managing this transformation.
Cost-effective operations
With the scale and complexity of today’s networks growing all the time, there can be no doubt that automation will need to touch almost every aspect of operations—from capacity planning to service provisioning to troubleshooting and maintenance. Carriers are making significant investments to adopt and embed automation into as many aspects of their business as possible. An important benefit of this transformation will be offloading of many manually intensive tasks which typically require highly trained and experienced technicians. For example, troubleshooting and root-cause analysis often required a 2nd or 3rd level technician with a deep understanding of the way the network works to be able to come up with a suitable solution. With a data-driven operations model, this knowledge is embedded in the automation systems allowing optimal solutions that can take into account additional dependencies and network impacts, to be developed in minutes rather than hours or days; and, with fewer expensive technicians, less chance of human error and consistency with network policies.
Proactive rather than reactive operations
Imagine being able to identify and fix issues before they become problems. Imagine if your customers never complained about service quality because it’s never an issue. Imagine never getting penalized for violating an SLA.
While this may seem like a dream, this is exactly what data-driven operations is all about. Identifying negative trends as they appear, understanding any potential impact to customers and the network, prioritizing fixes to maximize impact and minimize service issues, reducing cost of operations and providing insight for capacity and service planning so that your network can continue to grow.
Having the 100% visibility of service and network performance and knowing how best to address them is the key to proactively eliminating issues before they become service impacting.
First things first
Anyone who works in a network or service operations center can tell you there’s no such thing as an issue-free network. In fact, some days it can feel like the network is a ‘sea of red’. On those days, when the pressure is on to get the most important services back up and running, sometimes the biggest problem is knowing where to start. Which customers are the most critical? Is there a common root cause? Can services be routed without causing even more problems? Which faults are ‘must fix’ versus ‘nice to fix’? And as networks become virtualized, it becomes harder for humans to visualize the service and network topology making it more difficult to know where to start.
Data-driven operations can eliminate this all together. By applying big-data analytics, driven by consistent policy enforcement across all network and service KPIs, a plan of attack can quickly be developed that takes into account everything from getting critical customers and services back up, to total time of outage, to impact to revenue and cost—without any guesswork or emotional bias.
Managing change
There’s a new paradigm emerging for customer managed services. Driven by competition and a demand for more application-specific services, customers are now able to acquire and manage their own services. Through online portals, customers can now do things like order new services or change an existing service, perhaps requesting more bandwidth, and have it happen within minutes. And while this is great for attracting and retaining customers, it makes the job of service and capacity planning more challenging. Can the network support an additional service without impacting existing services? Is there enough capacity to support the additional bandwidth requested?
Without this information, there’s a real risk of not being able to meet customer commitments. Data-driven operations can address these questions in real-time. By having visibility into things like unused capacity, a new or modified service can quickly be configured. Additionally, if the new service looks like it might cause problems elsewhere in the network, optimization of service routing across a larger portion of the network can quickly be done to ensure the new service can be added. Finally, since the system is always aware of capacity throughout the network, capacity planning methods can be developed to support a just-in-time methodology to upgrades, thereby reducing or eliminating unused capacity.
Solutions
Operations teams are being driven to cut expenses, make better decisions, shorten the time taken to identify and resolve issues—basically, do more with less. And at the same time, they need to make sure customer satisfaction remains high. Data-driven operations are essential to meeting these goals. Having the right information, at the right time to make the right decision will drive efficiencies throughout the organization. Do you have the right systems in place to be a data-driven operations team?
KPI generation
For data-driven operations to be successful you need data. Specifically you need a broad range of key performance indicators for services, networks and customers. Active probing solutions are the best way to derive end-to-end service KPIs while passive methods such as SNMP polling are good for getting equipment and network KPIs. Derived KPI metrics, such as a mean opinion score (MOS), provide good insight into the customer’s experience.
When deriving service metrics using active probing, it is becoming more and more critical to have one-way metrics which provide an independent view of the transmit and receive directions. Many services are highly asymmetric in nature and therefore may not experience the same delay in both directions. Additionally, in SDN networks, there’s no guarantee that the transmit and receive direction will follow the same path.
EXFO has an extensive portfolio of active probing solutions, both physical and virtual, as well as the tools and systems to gather, correlate and analyze network, service and customer KPIs from many sources, including 3rd party devices.
Correlation and inference
By instrumenting every service and extracting indicators from the network, carriers will have all the data they need to understand how the network and services are performing. The problem is, most of this detail is simply unstructured data and virtually impossible to work with. What’s really needed is a filtered view of the data which highlights trends and out-of-specification performance, and shows any possible correlation between unrelated events. Having this allows operations teams to see the bigger picture and understand how many subtle, otherwise undetected changes may be adding up to cause a service impacting event. And by working on real-time KPI data, the time to detect, determine a root cause, and repair are minimized.
EXFO has KPI correlation and analytics tools, including real-time topology mapping to address the ever changing SDN environment, to enable your operations team to see the big picture and find that proverbial needle in the haystack.
Topology mapping
A critical part of any network operations function is an accurate view of the network infrastructure and service topology. Keeping this view up-to-date and accurate has long been an issue, even in traditional physical networks. The transformation of the network towards one that is hybrid or fully virtualized which leverages SDN and NFV, will make this task even harder. SDN, by its very nature, will continuously optimize the network to address performance issues and failures. At the same time, the orchestration system may create new virtual elements, such as routers, to address capacity issues. Since much of this is expected to happen autonomously, keeping an up-to-date view becomes even more difficult.
Real-time, automated topology mapping provides an ideal solution to this problem by constantly ‘discovering’ changes in the service topology and providing the live topology as a service to applications that may need it for critical activities like fault correlation or capacity reservation.
EXFO has the analytics and topology discovery tools needed to enable operations teams to keep the network running flawlessly.
Actionable insight
In the end, data-driven operations is all about actionable insight. Having the data, correlating and analyzing it is all good, but what’s most important is answering the question - “what needs to be done next?” Of course, the answer to this question depends on many factors, each of which needs to be considered in an unbiased, policy driven way to minimize risk to the company and maximize efficiency.
Data-driven operations addresses this question. By having visibility into the ‘big picture’ and consistently applying policy rules, a prioritized plan of attack can be quickly developed and updated as conditions change, ensuring the best use of operations personnel and least impact on customers.
EXFO has the analytics and topology discovery tools needed to allow a carrier’s operations team to keep the network running flawlessly.
Resources
All resourcesLanguage
Resource type