Something to observe here is whether the GOOD/BAD/DEAD is occurring while the link is idle, or while the link is under load (traffic).
The following information covers all cases, but this is the “most common root cause” for each scenario:
Idle – speed/duplex mismatch, ARP issue, IPS/IDS device and so on.
Under load – speed/duplex mismatch, misconfigured speed settings in NetScaler SD-WAN configuration, MTU issue and so on.
When a new link is installed, the following steps should be conducted:
Test speed of the new link. If the SD-WAN speed settings are configured for a higher throughput than the link can actually go, the SD-WAN will try to send the full amount of the configured speed (at this time, there is no auto-discover bandwidth capability). As an example, if SD-WAN tries to send 5 Mb of traffic down a 3 Mb link, the SD-WAN will experience loss (path going “BAD”), and when the SD-WAN experiences too much loss the path is declared “DEAD.” Once the path returns to a “GOOD” state, SD-WAN will once again try to send data down the path and cycle repeats. If a speed test is not possible, try adjusting the WAN link speed down to see if this improves performance. For a more permanent solution, run speed test across the network using UDP port 4980 to validate that 4980 port will make it through the network and that the ISP is not dropping or handing UDP traffic differently then expected.
Disable the SD-WAN service and ping all WAN link VIP’s (all pings should fail) validating that there is no duplicate IP on the network for VIPs. Disable the appliance by going to Configuration > Virtual WAN > Enable/Disable – Purge Flows.
The SD-WAN appliance will auto-detect a duplicate IP address and disable itself. This could be the reason for a path reporting a “Dead” Path State. However, if a duplicate IP address resides where the SD-WAN cannot detect the MAC address, this same symptoms could occur.
Next, work though the layers to troubleshoot this issue.
Verify the Ethernet settings by going to Configuration > Appliance Setting > Network Adaptors > Ethernet tab.
As shown in the following screen shot, the interfaces are set to auto-negotiate. The greyed out numbers will indicate what the ports have negotiated to. Also as shown in the following screen shot configured ports 1/1 and 1/3 have negotiated to 100Mb/Full, while port 1/2 has negotiated to 1000Mb/Full. If a connected port has been hard-coded to 100/Full and SD-WAN has been set for auto-negotiate, you might see 100Mb/Half.
Go to Monitor > Virtual WAN > Statistics > under Show, select Ethernet from the drop-down list. Verify if there are any interface errors.
Examine the interface settings on all applicable external devices (switch, firewall/router).
Go to Monitoring > Virtual WAN > Statistics > under Show, select ARP from the drop-down list.
Verify if the gateway’s ARP entry reply age exceeds 1000ms. SD-WAN will ARP for the gateway once per second. If the ARP reply is not received in less than 1000ms (1 second), the SD-WAN will then declare the path down. Some devices may have an ARP threshold (or ARP DoS setting) that must be adjusted or turned off.
Send a ping WAN router to WAN router and verify if you see drops. This could indicate a Service Provider issue.
Send a ping WAN router to WAN router with a DF bit on to prevent fragmenting packets. See what MTU the ping returns for. Take a packet-capture on the SD-WAN and see what the largest MTU size is. Adjust the WAN link MTU in the SD-WAN configuration if necessary.
There are also cases when the SD-WAN is connected between multiple switches to the WAN router. In this case, there could be a misconfigured MTU or duplex issue outside of the SD-WAN physical connections. This will typically show loss on the SD-WAN when user data is pushed through the WAN link, but when idle, very little if any loss will be seen.
Verify to see if the IPS/IDS firewall features are turned on for UDP 4980. In SOHO routers, IPS is turned on by default and can cause degraded performance.
Verify if the SD-WAN appliance shares the WAN link with other traffic not flowing through the SD-WAN or unaccounted-for traffic. If SD-WAN is on a WAN link that is sharing the bandwidth with appliances on the LAN network (which does not go through the SD-WAN), consider properly configuring the firewall or router that terminates the WAN link to adequately provide a set bandwidth speed for the SD-WAN and the competing traffic which is routed to bypass the SD-WAN, then properly configure SD-WAN to that assigned speed. Also consider turning up the congestion sensitivity threshold (if SD-WAN shares the link) otherwise SD-WAN will significantly back off using that WAN Link when it encounters contention.
If path is consistently BAD or know to have a certain level of packet loss, consider disabling the following functions: Bad_Loss_Sensitive. Or utilize the path state configurability feature introduced in 9.0 to better control when SD-WAN takes the path into a BAD state.
Enabling this feature takes away from the SD-WAN’s default behavior of intelligently identifying when a link quality starts to degrade due to characteristic changes of the line. So only use this feature with that in mind, and all investigation of poor quality line have determined that the loss on the WAN link is expected and that the desired behavior is to have the SD-WAN continue to use the link in that state and not back off its send rate or usage.