NetScaler Networking and VLAN Best Practices

The NetScaler uses VLANs to determine which interface should be used for which traffic. In addition, NetScaler does not participate in Spanning Tree. Without the proper VLAN configuration, the NetScaler is unable to determine which interface to use and it can function more like a HUB than a switch or router in these instances (it will try to use ALL interfaces for each conversation).

Symptoms of VLAN Misconfiguration

This type of issue can manifest itself in many forms, including performance issues, inability to establish connections, randomly disconnected sessions, and in severe situations, network disruptions seemingly unrelated to the NetScaler itself. The NetScaler may also report MAC Moves, muted interfaces, and/or management interface transmit or receive buffer overflows, depending on the exact nature of the interaction with your network.

MAC Moves: (counter nic_tot_bdg_mac_moved) This indicates that the NetScaler is using more than one interface to communicate with the same device (MAC address), because it could not properly determine which interface to use.

Muted interfaces: (counter nic_err_bdg_muted) This indicates that the NetScaler has detected that it is creating a routing loop due to VLAN configuration issues, and as such, it has shut down one or more of the offending interfaces in order to prevent a network outage.

Interface buffer overflows, typically referring to management interfaces: (counter nic_err_tx_overflow) This can be caused if too much traffic is being transmitted over a management interface. Management interfaces on the NetScaler is not designed to handle large volumes of traffic, which may result from network and VLAN misconfigurations triggering the NetScaler to use a management interface for production data traffic. This often occurs because the NetScaler has no way to differentiate traffic on the VLAN / subnet of the NSIP (NSVLAN) from regular production traffic. It is highly recommended that the NSIP be on a separate VLAN and subnet from any production devices such as workstations and servers.

Orphan ACKs (counter tcp_err_orphan_ack): This counter indicates that the NetScaler received an ACK packet that it was not expecting, typically on a different interface than the ACK’d traffic originated from. This situation can be caused by VLAN misconfigurations where the NetScaler transmits on a different interface than the target device would typically use to communicate with the NetScaler (often seen in conjunction with MAC moves)

High rates of retransmissions / retransmit giveups: (counters: tcp_err_retransmit_giveups, tcp_err_7th_retransmit, various other retransmit counters) The NetScaler will attempt to retransmit a TCP packet a total of 7 times before it gives up and terminates the connection. While this situation can be caused by network conditions, it often occurs as a result of VLAN and interface misconfiguration.

High Availability Split Brain: Split Brain is a condition where both HA nodes believe they are Primary, leading to duplicate IP addresses and loss of NetScaler functionality. This is caused when the two HA nodes cannot communicate with each-other using HA Heartbeats on UDP Port 3003 using the NSIP, across any interface. This is typically caused by VLAN misconfigurations where the native VLAN on the NetScaler interfaces do not have connectivity between NetScalers.

Best Practices for VLAN and Network Configurations

  1. Each subnet should be associated with a VLAN.

  2. More than one subnet can be associated with the same VLAN (depending on your network design).

  3. EACH VLAN SHOULD BE ASSOCIATED TO ONLY ONE INTERFACE (for purposes of this discussion, a LA Channel counts as a single interface).

  4. If you require more than one subnet to be associated with an interface, the subnets must be tagged.

  5. Contrary to popular belief, the Mac-Based-Forwarding (MBF) feature on the NetScaler is not designed to mitigate this type of issue. MBF is designed primarily for the DSR (Direct Server Return) mode of the NetScaler, which is rarely used in most environments (it is designed to allow traffic to purposely bypass the NetScaler on the return path from the backend servers). MBF may hide VLAN issues in some instances, but it should not be relied-upon to resolve this type of problem.

  6. Every interface on NetScaler requires a native VLAN (unlike Cisco, where native VLANs are optional), although the TagAll setting on an interface can be used so that no untagged traffic will leave the interface in question.

  7. The native VLAN can be tagged if necessary for your network design (this is the TagAll option for the interface).

  8. The VLAN for the subnet of your NetScaler’s NSIP is a special case. This is called the NSVLAN. The concepts are the same but the commands to configure it are different and changes to the NSVLAN require a reboot of the NetScaler to take effect. If you attempt to bind a VLAN to a SNIP that shares he same subnet as the NSIP, you will get “Operation not permitted.” This is because you have to use the NSVLAN commands instead. See CTX123172 for details. Also, on some firmware versions, you cannot set an NSVLAN if that VLAN number already exists via the “add VLAN” command. Simply remove the VLAN and then set the NSVLAN again.

  9. HA Heartbeats always use the Native VLAN of the respective interface (optionally tagged if the TagAll option is set on the interface).

  10. There must be communication between at least one set of Native VLAN(s) on the two nodes of an HA pair (this can be direct or via a router). The native VLANs are used for HA heartbeats. If the NetScalers cannot communicate between native VLANs on any interface, this will lead to HA failovers and possibly a split-brain situation where both NetScalers think they are primary (leading to duplicate IP addresses, amongst other things).

  11. The NetScaler does not participate in spanning tree. As such, it is not possible to use spanning tree to provide for interface redundancy when using a NetScaler. Instead, use a form of Link Aggregation (LACP or manual LAG) for this purpose.

    Note: If you wish to have link aggregation between multiple physical switches, you must have the switches configured as a virtual switch, using a feature such as Cisco’s Switch Stack.

  12. The HA Synchronization and Command Propagation, by default, uses the NSIP/NSVLAN. To separate these out to a different VLAN, you can use the SyncVLAN option of the set HA node command.

  13. There is nothing built-in to the NetScaler’s default configuration that denotes that a management interface (0/1 or 0/2) is restricted to management traffic only. This must be enforced by the enduser through VLAN configuration. The Management interfaces are not designed to handle data traffic, so your network design should take this into account. Management interfaces, contained on the Netscaler motherboard, lack various offloading features such as CRC offload, larger packet buffers, and other optimizations, making them much less efficient in handling large amounts of traffic. In order to separate production data and management traffic, the NSIP should not be on the same subnet/VLAN as your data traffic.

  14. If it is desired to use a management interface to carry management traffic, it is best practice that the Default Route be on a subnet other than the subnet of the NSIP (NSVLAN). In many configurations, the default route will be relied-upon for workstation commmunications (in an internet scenario). If the default route is on the same subnet as the NSIP this will lead to such traffic using the management interface, which can cause the interface to be overloaded.

  15. Additionally to #10-on an SDX-the SVM, XenServer, and all Netscaler instance NSIP’s should be on the same VLAN and subnet. There is no “backplane” inside of the SDX that allows for communication between SVM/Xen/Instances. If they are not on the same VLAN/subnet/interface, traffic between them must leave the physical hardware, be routed on your network, and return. This can lead to obvious connectivity issues between the instances and SVM and as such, is not recommended. A common symptom of this is a Yellow Instance State indicator in the SVM for the VPX instance in question and the inability to use the SVM to reconfigure a VPX instance.

  16. In the event that some VLANs are bound to subnets and some are not, during an HA failover, GARP packets will not be sent for any IP addresses on any of the subnets that are not bound to a VLAN. This can cause dropped connections and connectivity issues during HA failovers, as the NetScaler cannot notify the network of the change of MAC ownership of IP addresses on non-VMAC-configured Netscalers. Symptoms of this are that during/after a HA failover, the ip_tot_floating_ip_err counter increments on the former primary NetScaler for more than a few seconds, indicating that the network did not receive or process GARP packets and the network is continuing to transmit data to the new secondary NetScaler.

Additional Resources

Related:

Leave a Reply