In-Band Network Telemetry: Next Frontier in Network Visualization and Analytics

In-Band Network Telemetry: Next Frontier in Network Visualization with Analytics and Why Enterprise Customer Care

By: Gautam Chanda, Global Product Line Manager DC Networking Analytics, HPE

Let’s first answer the important question: Why do we need Network Visualization and Analytics?

Data Center networks have become cloud scale and deployment of hyper-converged networks is increasing. Telecom networks will enable faster connectivity everywhere with higher bandwidth delivering 5G wireless services. All of these next-generation networks not only require much higher bandwidth, but they also require real-time telemetry to deliver services with good Quality of Experience (QoE).

A network with detailed real-time visibility enables better reliability and real-time control. Here are key reasons customers need Network Visualization and Analytics now even more than before:

  • Ability to Pinpoint Traffic Patterns for Dynamic Applications: Data centers now have increasingly complex network deployments with Network Virtualization & Overlay / Tunnel technologies; SDN/NFV; Silicon Programmability; Multi-tenancy; increased Applications volume; mobility; Hybrid cloud; Bare metal & Virtualized servers (VMs/Containers); Vswitch; NIC virtualization; Orchestration and the list goes on. This gives rise to increasingly complicated traffic patterns in the data center in which network operators would like to have greater visibility into those complex patterns to understand if their DC network infrastructure is performing optimally.
  • Security Challenges: More security concerns can arise in complicated IT scenarios, more strict regulatory compliances, and more cybersecurity attacks from both inside and outside data center are threats. Defense against Security Attacks and complex traffic patterns from both inside and outside of the data center are critical.
  • Intent-Based Network
  • Network Analytics (Visibility, Validation, Optimization & Upgrade, Troubleshooting, Policy Enforcement) is increasingly important for modern DC and Cloud deployments.

Old Network Management Tools such as SNMP is not up to the task in this very high speed networks as we move from 10G to 25G to 100G and beyond in a short order.

The figure below demonstrates very well the need for Network Visualization and Analytics:

INTBlogPhoto1.png

This bring us to In-Band Network Telemetry (INT).

Let’s pause for a minute:

  • Let’s assume you’re interested in the behaviour of your live user-data traffic.
    • What is the best source of information?
  • Well… probably the live user-data traffic itself.
    • Let’s add meta-data to all interesting live user-data traffic.

This is the essence of In-Band Network Telemetry.

The figure below contrasts traditional ways where in traditional network monitoring, an application polls the host CPU to gather aggregated telemetry every few seconds or minutes, which doesn’t scale well in next generation networks. In-Band Network Telemetry, however, enables packet level telemetry by having key details related to packet processing added to the data plane packets without consuming any host CPU resources:

Figure 2: Traditional vs New Way

INTBlogPhoto2.png

In-Band Network Telemetry (INT) is a sophisticated and flexible telemetry feature supported usually within the Network devices in HW. As explained above INT allows for the collection and reporting by the data plane on detailed latency, congestion, and network state information, without requiring intervention or work by the control plane. The INT enabled devices inserts this valuable metadata, which can then be extracted and interpreted later by a collector/Sink/Network Management SW such as HPE IMC, in-band without affecting network performance.

The INT will enable a number of very useful Customer Use Cases such as:

  • Network troubleshooting
    • When packets enter/exit networks
    • Which path was taken by individual flows associated with Specific Applications
    • How long packets spend at each hop
    • How long packets spend on each link
    • Which switches are seeing congestion?
    • Microburst detection
  • Real-time control or feedback loops:
    • Collector might use the INT data plane information to feed back control information to traffic sources, which could in turn use this information to make changes to traffic engineering or packet forwarding. (Explicit congestion notification schemes are an example of these types of feedback loops).
  • Network Event Detection:
    • If the collected path state indicates a condition that requires immediate attention or resolution (such as severe congestion or violation of certain dataplane invariances), the Collector could generate immediate actions to respond to the network events, forming a feedback control loop either in a centralized or a fully decentralized fashion (a la TCP).
  • List Goes On…..

The Figure below shows end to end INT Customer Use Case in a Data Center:

Figure 3: End To End INT

INTBlogPhoto3.png

In Figure 3 above shows how In-Band Network Telemetry is used to “Track in Real Time Path and Latency of Packets and Flows Associated with Specific Applications”:

  • Collect the physical path and hop latencies hop-by-hop for every packet.
  • Can be initiated /Transited / terminated by either a switch or a NIC (Network Interface Card) in a Host such as a Server.
  • INT metadata is encapsulated and exported to the collector (e.g. HPE IMC).

Use Cases

  • Case 1a: Real-time fault detection and isolation or alert: Congested/oversubscribed links and devices, imbalanced links (LAG, ECMP), loop.
  • Case 1b: Interactive analysis & troubleshooting: On-demand path visualization; Traffic matrix generation; Triage incidents of congestion.
  • Case 1c: Path Verification of bridging/routing, SLA, and configuration effects.
  • Enhanced visibility for all your Network traffic
  • Network provided telemetry data gathered and added to live data
    • Complement out-of-band OAM tools like SNMP, ping, and traceroute
    • Path / Service chain verification
  • Record the packet’s trip as meta-data within the packet
    • Record path and node (i/f, time, app-data) specific data hop-by-hop and end to end
    • Export telemetry data via Netflow/IPFIX/Kafka to Controller/Apps
  • In-band Network Telemetry can be implemented without forwarding performance degradation
  • Network ASIC vendors have started to add INT as a built in functions within their newest ASICs

HPE FlexFabric Network Analytics solution is leading the way towards this next frontier in Network Visualization and Analytics.

Related:

Watch4net/ViPR SRM: Traffic flow listener throws exception while parsing netflow packets

Article Number: 516827 Article Version: 6 Article Type: Break Fix



Watch4net

Flow Listener throws exception while parsing the netflow packet without nbar packet.

See errors similar to:

SEVERE – [2016-08-30 01:53:21 EDT] – FlowListener$FlowWorker::run(): Error while listening

com.watch4net.events.processing.ProcessingException: ForwardRule (nbar-status.xml:34)

at com.watch4net.events.processing.processor.RuleResult.fromResult(SourceFile:89)

at com.watch4net.events.processing.processor.rules.AbstractRule.setBitField(SourceFile:226)

at com.watch4net.events.processing.processor.rules.AbstractRule.evaluate(SourceFile:257)

at com.watch4net.events.processing.processor.rules.RuleChain.doEvaluate(SourceFile:87)

at com.watch4net.events.processing.processor.rules.AbstractRule.evaluate(SourceFile:253)

at com.watch4net.events.processing.processor.rules.SwitchRule.doEvaluate(SourceFile:258)

at com.watch4net.events.processing.processor.rules.AbstractRule.evaluate(SourceFile:253)

at com.watch4net.events.processing.processor.rules.RuleChain.doEvaluate(SourceFile:87)

at com.watch4net.events.processing.processor.rules.AbstractRule.evaluate(SourceFile:253)

at com.watch4net.events.processing.processor.rules.RootRule.evaluate(SourceFile:93)

at com.watch4net.events.processing.processor.EvaluatorStreamHandler.handleEvent(SourceFile:57)

at com.watch4net.events.common.processing.DirectStreamSource.handleEvent(DirectStreamSource.java:66)

at com.watch4net.events.processing.processor.rules.ForwardRule.doEvaluate(SourceFile:46)

at com.watch4net.events.processing.processor.rules.AbstractRule.evaluate(SourceFile:253)

at com.watch4net.events.processing.processor.rules.RuleChain.doEvaluate(SourceFile:87)

at com.watch4net.events.processing.processor.rules.AbstractRule.evaluate(SourceFile:253)

at com.watch4net.events.processing.processor.rules.SwitchRule.doEvaluate(SourceFile:258)

at com.watch4net.events.processing.processor.rules.AbstractRule.evaluate(SourceFile:253)

at com.watch4net.events.processing.processor.rules.RuleChain.doEvaluate(SourceFile:87)

at com.watch4net.events.processing.processor.rules.AbstractRule.evaluate(SourceFile:253)

at com.watch4net.events.processing.processor.rules.RootRule.evaluate(SourceFile:93)

at com.watch4net.events.processing.processor.EvaluatorStreamHandler.handleEvent(SourceFile:57)

This occurs if nbar option is enabled within the configuration of the SP.

This is a Solution Pack problem that can’t handle nbar data not being present in the event.

Apply Patch attached, to Event-Property-Tagger for Traffic Flow.

To apply the Patch, please do the following:

1) Got to Centralized Management and click on Packages Management and then click upload

2) Upload the package which can be found in this KB

User-added image

3) Once uploaded, go to Logical Overview –> Events –> Event-Property-Tagger::generic-traffic-flow

4) Click on “Manually update to Latest Version” dropdown and click Update to version 3.0u2

User-added image

5) Keep everything default (if correct) and click Update

Restart Event Processing Manager for Traffic Flow.

Restart Tomcat.

This Patch is Mandatory for all Traffic Flow installations on M&R 6.7

Related:

Watch4net | EMC M&R 6.8u2 and up – Traffic Flows Solution Pack: “Unknown field type for template” “Template not found” error in EPM logs

Article Number: 463760 Article Version: 4 Article Type: Break Fix



Watch4net 6.6, Watch4net 6.4u1, Watch4net 6.4u3, Watch4net 6.5u1, Watch4net 6.5u2, Watch4net 6.5u3, Watch4net 6.5u4, Watch4net 6.6u1



The following errors might be seen in processing log files when collecting from a device on Netflow v9:

WARNING — [2016-01-07 04:52:27 PST] — LogMitigator::inc(): Unknown field type for template. Key: ‘[routerSrc=10.xxx.x.xx, sourceId=1, templateId=257, fieldType=346]’, Count: 1000.

WARNING — [2016-01-07 04:52:27 PST] — LogMitigator::inc(): Unknown field type for template. Key: ‘[routerSrc=10.xxx.x.xx, sourceId=1, templateId=257, fieldType=56701]’, Count: 1000.

WARNING — [2016-01-07 04:52:27 PST] — LogMitigator::inc(): Unknown field type for template. Key: ‘[routerSrc=10.xxx.x.xx, sourceId=1, templateId=257, fieldType=56702]’, Count: 1000.

The above errors indicate that field types 346, 56701, and 56702 are not recognized by the Event-Processing-Manager.

Also, some warnings in newer version of Watch4net contain the following:

WARNING — [2018-06-29 09:12:08 EDT] — LogMitigator::inc(): Template not found. Key: ‘FlowTemplateId [routerSrc=XX.XXX.XX.XX, sourceId=3203342338, templateId=256]’, Count: 1.

Traffic Flows Solution Pack is currently designed to accept Netflows v5 templates. Thus when collecting from v9 devices, an unknown template or unknown field type message might be seen in the logs.

Note: from a support perspective we can do best effort on these issues, but EMC Professional Services might need to be engaged if the error being received is an unknown template error.

If receiving an unknown field type error as this article describes, the field types should be added in nf9Fields.properties file. A Palo Alto firewall device is used in this example. Searching the vendor’s website for documentation yielded information on the field types that are unknown.

In this example, the field types are as follows with their corresponding values:

346=privateEnterpriseNumber

56701=App-ID

56702=User-ID

To add the field types:

  • From centralized-management, select the Event-Processing-Manager server for Traffic Flows
  • Go to Modules > Event-Processing > Flow-Listener::generic-traffic-flow
  • Click Configuration Files and edit nf9Fields.properties
  • Add the field types that were provided by vendor (using Palo Alto in this example):
# Palo Alto support

346=privateEnterpriseNumber

56701=App-ID

56702=User-ID
  • Go to Modules > Event-Processing > Event-Processing-Manager::generic-traffic-flow
  • Restart the service
  • Check the logs for any similar warnings: Unknown field type for template. If not present, then the data is going through.

Note: If the Field types are not known, or the Event Processing logs do not contain the data, you need to do the following:

1) Take a packet capture of the flows which are coming in from the router.

2) Open the PCAP in wireshark or similar tool look for the templateID# 256

User-added image

3) Make sure the “FieldTypes” present in these templates are mapped in the nf9Fields.properties file.

4) Restart the Traffic Flows EPM.

5) Verify the EPM logs and make sure the errors are no longer there

Related:

Watch4net: Enhancements and changes introduced in Traffic Flows v3.1 and later

Article Number: 520317 Article Version: 3 Article Type: How To



Watch4net

Here is a partial list of changes that have been introduced to Traffic Flows SP in M&R 6.7u1.

1. The major difference between Traffic Flow v2.1 and V3.1 is the introduction of combined collector and database.

It is now possible with Traffic Flows 3.1 to install fronted and collector and database on a single setup.

2. With respect the handling the number of flows, collector in both the version are capable of handling 10000 flows per second. Apart from this, the structure of the database tables has changed also, which means there have been XML mapping file changes also.

3. Changes in the reports: The reports more or less show the same information now and before but the structure and design of the reports have changed to accommodate the changes introduced in the database.

4. The Traffic Flow SP is officially qualified to work with routers only. But we can make it work with switches too by making changes to flow-global-enrichment.xml (PS engagement)

5. We collect the Router information from SNMP data of the device, based on that we show interface /ports in the reports and its respective information collected through NetFlow packets. If the SNMP data doesn’t have a interface information, then the data collected for that interface would go to the unmanaged exports.

6. There is no task which would clean up data when a interface goes down. But if we keep getting data from the NetFlow packets the newly added interfaces will come in reports. The SNMP data is just to cross-refer the interface number with the interface name fetched through the SNMP. added interfaces will come in reports. The SNMP data is just to cross-refer the interface number with the interface name fetched through the SNMP.

Related:

Qradar – how to find IPs that solely talk externally?

Hello,
We’re trying to identify source IPs that solely talk externally – they don’t have comms with internal clients. We’re trying to isolate those clients who *only* talk to internet destinations. We have netflow and can easily filter down to just outbound comms, but we’re unsure of how to structure the query so that clients who have previously talked internally are also removed.
Many thanks for any clues/help!

-Rick

Related:

  • No Related Posts

7021231: Can you install netflow Collector manager on top of collector manager appliance?

This document (7021231) is provided subject to the disclaimer at the end of this document.

Environment

Sentinel 7.4 Collector Manager

Sentinel 8.0 Collector Manager
Sentinel Netflow Collector Manager

Situation

No existing licenses for a full SUSE install. So doing a Netflow install on top of an existing Collector Manager Appliance so it overwrites the collector manager component but leaves the OS in place.

Resolution

This can not be done on any version. The soft client sees the install as a full overwrite so will not let the collector manager be removed. To install the Netflow connector it must be installed on a full OS.

SentinelCollector Manager is already installed. We recommend that you do not installSentinel Netflow Collector Manager over an existing SentinelCollector Manager install.

Apartial installation of Sentinel Netflow Collector Manager exists. Executeuninstall-netflow before attempting to re-install.”

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented “AS IS” WITHOUT WARRANTY OF ANY KIND.

Related:

Netflow direction in QRadar showing L2R instead of R2L

In NetFlow data, we are observing inbound firewall deny traffic as outbound one. The flow direction is showing reverse. Actual traffic coming from external IP to internal IP but in flow details its showing source as Internal IP and destination as External one. Kindly provide your inputs.

Related:

Help with replication in Lotus Notes

Hi

I got really weird situation going on and I’d be really glad if you guys could give me a little help.

I have a server notes replicating to another server. When I see the log in the administrator it says that the notes replicated 1.5G per day, but when I check the netflow it strikes 23G per day.

There isn’t other aplication on my server, only the lotus notes is running there.

What is rather odd is that my server is almost all the time sending 138.4 Mbytes and receiving 2.2 Mbytes.

Would you guys know what it could be ??

Thank you very much!

Related:

The business case for deploying network monitoring systems

It takes a significant amount of time and money to properly design and implement an enterprise-class network performance monitoring (NPM) foundation. And because of this, you need to understand not only what features are available within an NPM package, but whether those features will actually be beneficial to your infrastructure team.

You may find that only the basic features are needed today, but you may want the ability to add more complex tools in the future. Others might find it’s best to deploy a fully functional NPM on day one. Most, however, will likely fall somewhere in between. In this article, we examine why these performance-oriented network monitoring systems bear watching and which features are most important.

Downtime is becoming increasingly unacceptable

In a perfect world, QoS will be properly configured from one end of the network to the other. But oftentimes, QoS is either not configured or poorly configured somewhere along the data path.
One obvious trend driving NPM is the need to quickly resolve downtime issues that arise. While the ideal solution would be to create a fully redundant network from end to end, in many cases this isn’t possible. This can be due to limitations in the architecture itself, an inability to provide physical redundancy, or budgets that preclude a fully redundant approach. When automated failover isn’t possible, the next best thing is to develop and deploy an advanced network monitoring system platform to identify and alert staff when an outage is occurring — or about to occur. The faster a problem can be identified, the faster it can be fixed.

In some cases, this simply means implementing tools to monitor network devices and individual links. Alerting based on collected log messages is another common tool. In other cases, monitoring all the way to the application layer is required. The vast majority of network monitoring systems today offer the ability to monitor network-only functions or monitor and alert on both network and application issues that arise. Additionally, deep packet inspection appliances can rapidly find performance issues at critical points on the network.

Applications are becoming more time-sensitive

Thanks to a dramatic increase in real-time collaboration applications like voice and video — as well as the growth of distributed application architectures — data traversing networks is more time-sensitive than ever. As a result, data streams for low-latency applications must be identified, marked and treated with a higher priority than other data running across the same network connections. The primary tool to perform these types of tasks is quality of service (QoS). Layer 2 and 3 devices, such as routers and switches, are configured with QoS policies and queuing actions based on those policies.

In a perfect world, QoS will be properly configured from one end of the network to the other. But oftentimes, QoS is either not configured or poorly configured somewhere along the data path. This one mistake can cause major problems for time-sensitive communications. Identifying these problems manually often requires logging in and verifying each QoS configuration along the path. Many network monitoring systems, on the other hand, have QoS analysis capabilities, using NetFlow or sFlow, to automatically identify ineffective or incorrectly configured QoS policies.

Network architecture is growing in complexity

Data center virtualization and network overlays often mask underlying network problems. Suddenly, administrators have to troubleshoot both the underlying physical foundation as well as accompanying virtualized networks in order to find and resolve performance problems. Many IT departments only have tools to monitor one or the other. And if they have the ability to monitor both, they may be completely independent tools.

Many modern NPMs can monitor both physical and virtualized architectures and determine on which network plane the problem resides. This gives support administrators complete visibility into the network, an increasingly important requirement as more virtualization and overlay techniques are added.

Event correlation and root cause analysis is ineffective

Finding and resolving network and application problems is one thing. Finding the root cause of the problem is another. On very large and complex networks, it’s very possible to implement fixes or workarounds that resolve the immediate issue, yet never address the underlying cause. Many times, this leads to drastic and inefficient network changes to fix a problem — when the root cause was actually due to upper-layer problems that went unchecked.

Many network monitoring systems offer added intelligence to collect and analyze various network and application events. By doing so, reports can be created that correlate — or at least isolate — the origin of the initial problem began. When properly configured and tuned, this significantly reduces root cause investigations by helping the administrator focus on the problem and verify the correlated information. And since modern NPMs collect data up to the application level, many root causes that previously went unnoticed can now be identified and properly remediated.

Seeking single-pane-of-glass monitoring and troubleshooting

The potential of integrating so many useful network and performance monitoring tools into a single, unified system is highly appealing. Gone are the days of independent SNMP monitors, logging servers, NetFlow collectors and packet sniffers. We now have the ability to unify all of these useful features into a single NPM product. What’s more, by creating a single pane of glass, we also create a single data repository for which reports and intelligent decisions can be made with powerful data correlation methods.

Source: http://searchnetworking.techtarget.com/feature/The-business-case-for-deploying-network-monitoring-systems

Related: