It takes a significant amount of time and money to properly design and implement an enterprise-class network performance monitoring (NPM) foundation. And because of this, you need to understand not only what features are available within an NPM package, but whether those features will actually be beneficial to your infrastructure team.
You may find that only the basic features are needed today, but you may want the ability to add more complex tools in the future. Others might find it’s best to deploy a fully functional NPM on day one. Most, however, will likely fall somewhere in between. In this article, we examine why these performance-oriented network monitoring systems bear watching and which features are most important.
Downtime is becoming increasingly unacceptable
In a perfect world, QoS will be properly configured from one end of the network to the other. But oftentimes, QoS is either not configured or poorly configured somewhere along the data path.
One obvious trend driving NPM is the need to quickly resolve downtime issues that arise. While the ideal solution would be to create a fully redundant network from end to end, in many cases this isn’t possible. This can be due to limitations in the architecture itself, an inability to provide physical redundancy, or budgets that preclude a fully redundant approach. When automated failover isn’t possible, the next best thing is to develop and deploy an advanced network monitoring system platform to identify and alert staff when an outage is occurring — or about to occur. The faster a problem can be identified, the faster it can be fixed.
In some cases, this simply means implementing tools to monitor network devices and individual links. Alerting based on collected log messages is another common tool. In other cases, monitoring all the way to the application layer is required. The vast majority of network monitoring systems today offer the ability to monitor network-only functions or monitor and alert on both network and application issues that arise. Additionally, deep packet inspection appliances can rapidly find performance issues at critical points on the network.
Applications are becoming more time-sensitive
Thanks to a dramatic increase in real-time collaboration applications like voice and video — as well as the growth of distributed application architectures — data traversing networks is more time-sensitive than ever. As a result, data streams for low-latency applications must be identified, marked and treated with a higher priority than other data running across the same network connections. The primary tool to perform these types of tasks is quality of service (QoS). Layer 2 and 3 devices, such as routers and switches, are configured with QoS policies and queuing actions based on those policies.
In a perfect world, QoS will be properly configured from one end of the network to the other. But oftentimes, QoS is either not configured or poorly configured somewhere along the data path. This one mistake can cause major problems for time-sensitive communications. Identifying these problems manually often requires logging in and verifying each QoS configuration along the path. Many network monitoring systems, on the other hand, have QoS analysis capabilities, using NetFlow or sFlow, to automatically identify ineffective or incorrectly configured QoS policies.
Network architecture is growing in complexity
Data center virtualization and network overlays often mask underlying network problems. Suddenly, administrators have to troubleshoot both the underlying physical foundation as well as accompanying virtualized networks in order to find and resolve performance problems. Many IT departments only have tools to monitor one or the other. And if they have the ability to monitor both, they may be completely independent tools.
Many modern NPMs can monitor both physical and virtualized architectures and determine on which network plane the problem resides. This gives support administrators complete visibility into the network, an increasingly important requirement as more virtualization and overlay techniques are added.
Event correlation and root cause analysis is ineffective
Finding and resolving network and application problems is one thing. Finding the root cause of the problem is another. On very large and complex networks, it’s very possible to implement fixes or workarounds that resolve the immediate issue, yet never address the underlying cause. Many times, this leads to drastic and inefficient network changes to fix a problem — when the root cause was actually due to upper-layer problems that went unchecked.
Many network monitoring systems offer added intelligence to collect and analyze various network and application events. By doing so, reports can be created that correlate — or at least isolate — the origin of the initial problem began. When properly configured and tuned, this significantly reduces root cause investigations by helping the administrator focus on the problem and verify the correlated information. And since modern NPMs collect data up to the application level, many root causes that previously went unnoticed can now be identified and properly remediated.
Seeking single-pane-of-glass monitoring and troubleshooting
The potential of integrating so many useful network and performance monitoring tools into a single, unified system is highly appealing. Gone are the days of independent SNMP monitors, logging servers, NetFlow collectors and packet sniffers. We now have the ability to unify all of these useful features into a single NPM product. What’s more, by creating a single pane of glass, we also create a single data repository for which reports and intelligent decisions can be made with powerful data correlation methods.