By: Gautam Chanda, GPLM DC Networking, HPE
In today’s world of hyper speed business decisions and need to be agile enough to stay ahead of the competition, match or exceed new market demands, and manage demanding customer expectations, your customer facing application behavior will succeed or fail based primarily on its performance.
This puts a tremendous responsibility on Network Engineers (aka Network Operators) in Enterprises and HyperScale Data Centers as most of the unexplained application performance shortfalls result from an underlying network infrastructure not providing proper performance and scale on demand, or it was not adequately designed for optimum application performance. The Network Operators will have to ensure that their networks are responsive, “always on” and capable of meeting the ever-growing demand from applications they run. Providing operators with deeper instrumentation and telemetry data about the network help operators diagnose network issues, plan and fine-tune the network to provide improved performance and make optimal use of network resources.
One of the main causes of these unexplained application performance issues stems from latency caused by underlying network congestion. Among the causes of a network, congestion is an elusive type of network congestion called “Microburst.” As the name suggest “microbursts” are sub-second periods of time when major bursts of network usage occur at line rate and can temporarily overflow the switch buffers and cause packet loss or backpressure.
Traditionally “congestion” has been associated with switch ports being utilized at close to line rate. In a congestion scenario, packets can be dropped by the switch or flows may backpressure due to lack of buffer space. However, a more recent analysis has uncovered the existence of these “microbursts” occurring more frequently than we may have guessed and there was no good way to detect them, resulting in network engineers looking for the proverbial “needle in a haystack” to find the causes of unexpected application performance issues in their network.
Typically, these “microbursts” do not last long enough to be detected by traditional switch counters such as SNMP or port statistics. This is because traditional tools used to monitor network traffic patterns, such as RMON and SNMP, have been based on a polling model where data is typically collected at one second or longer intervals. What about the events that will occur within these polling intervals? With the evolution to 100GbE attachment in the data center, within even a one-second interval a 100GbE interface could go from idle to forwarding over 280 million packets and back again. In a traditional SNMP/RMON polling model this 280 million packet burst can become invisible.
Let’s look at the potential business impact of these microbursts in a High-Frequency Trading (HFT) environment:
- In a NetworkingWorld article, Charles Thompson – NI manager of system engineering, stated “When trading floors open at 9:30 am Eastern time, their networks are flooded with a ridiculous number of trades that have been queued up since the night before. To analyze performance issues, network managers often have to break out a one-second period into smaller, microscopic intervals. So, they’ll chop up the one-second interval into 100-millisecond intervals, 10-millisecond intervals, or 5-microsecond intervals for investigations. When you get to a sub-second resolution, it’s referred to as a microburst. It’s a small period of time when a major burst of usage occurred…….But, we’ve had many customers requiring 100-microsecond increments, who will take advantage of this drill down capability.”
- An InformationWeek article stated, “A 1-millisecond advantage in trading applications can be worth $100 million a year to a major brokerage firm.”
Networks are critical to the business as they deliver applications and services to the rest of the organization. Networks must have high performance, low latency, reliability and security. Network/data center downtime is expensive and impacts the business outcome. By proactively detecting these elusive networks microburst allows the network operators to run their network at the most optimum performance level.
Learn more about the HPE FlexFabric 5950 100G TOR (Top of the Rack) Switch. This switch will provide the capability of detecting these elusive microbursts by embedding BroadviewTM Instrumentation analytics in the switch.