Failover issues with SGOS on ESX and Cisco

I need a solution

Hi there,

I’m having an issue with failover on two virtualized Symantec ( Bluecoat) proxies on two ESX hosts in two datacenters connected with Cisco switches.

I can see the Mulicast traffice leaving the proxy getting out into the world over the Cisco switches till the firewall blocks them. The packets should be delivered on L2 to the other switch to get into the other ESX-host on the other proxy running there.

But on the other host I don’t see any multicast-traffic incoming. Hence both feel responsible for the virtual IP what makes problems with Skype etc.

Did anyone have such an issue before? On ESX we activated promiscuous mode already for that vlan/subnet. But that didn’t change the issue.

The hardware proxies in the same network see the multicast-traffic incoming from the virtual machines and behave accordingly. As the virtual proxies don’t receive any multicast traffic they always assume to be master as the other one is not sending any updates.

I would understand that there might be an issue between the two Cisco-Switches that multicast traffic is not forwarded to the other. Other idea is – that there is a special setting on the ESX-Machine I’m not aware of? Any idea?

Thanks in advance,

Manfred

0

Related:

  • No Related Posts

Cisco Small Business Series Switches Simple Network Management Protocol Denial of Service Vulnerability

A vulnerability in the Simple Network Management Protocol (SNMP) input packet processor of Cisco Small Business Sx200, Sx300, Sx500, ESW2 Series Managed Switches and Small Business Sx250, Sx350, Sx550 Series Switches could allow an authenticated, remote attacker to cause the SNMP application of an affected device to cease processing traffic, resulting in the CPU utilization reaching one hundred percent. Manual intervention may be required before a device resumes normal operations.

The vulnerability is due to improper validation of SNMP protocol data units (PDUs) in SNMP packets. An attacker could exploit this vulnerability by sending a malicious SNMP packet to an affected device. A successful exploit could allow the attacker to cause the device to cease forwarding traffic, which could result in a denial of service (DoS) condition.

Cisco has released firmware updates that address this vulnerability. There are no workarounds that address this vulnerability.

This advisory is available at the following link:
https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190515-sb-snmpdos

Security Impact Rating: High

CVE: CVE-2019-1806

Related:

  • No Related Posts

Best Practices for Configuring Provisioning Services Server on a Network

This article provides best practices when configuring Citrix Provisioning, formerly Citrix Provisioning Server, on a network. Use these best practices when troubleshooting issues such as slow performance, image build failures, lost connections to the streaming server, or excessive retries from the target device.

Disabling Spanning Tree or Enabling PortFast

With Spanning Tree Protocol (STP) or Rapid Spanning Tree Protocol, the ports are placed into a blocked state while the switch transmits Bridged Protocol Data Units (BPDUs) and listens to ensure the BPDUs are not in a loopback configuration.

The amount of time it takes to complete this convergence process depends on the size of the switched network, which might allow the Pre-boot Execution Environment (PXE) to time out, preventing the machine from getting an IP address.

Note: This does not apply after the OS is loaded.

To resolve this issue, disable STP on edge-ports connected to clients or enable PortFast or Fast Link depending on the managed switch brand. Refer to the following table:

Switch Manufacturer

Fast Link Option Name

Cisco

PortFast or STP Fast Link

Dell

Spanning Tree FastLink

Foundry

Fast Port

3COM

Fast Start

Auto Negotiation

Auto Negotiation requires network devices and its switch to negotiate a speed before communication begins. This can cause long starting times and PXE timeouts, especially when starting multiple target devices with different NIC speeds. Citrix recommends hard coding all Provisioning Server ports (server and client) on the NIC and on the switch.

Stream Service Isolation

New advancements in network infrastructure, such as 10 Gb networking, may not require the stream service to be isolated from other traffic. If security is of primary concern, Citrix recommends isolating or segmenting the PVS stream traffic from other production traffic. However, in some cases, isolating the stream traffic can lead to a more complicated networking configuration and actually decrease network performance. For more information on whether the streaming traffic should be isolated, refer to the following article:

Is Isolating the PVS Streaming Traffic Really a Best Practice?

Firewall and Server to Server Communication Ports

Open the following ports in both directions:

  • UDP 6892 and 6904 (For Soap to Soap communication – MAPI and IPC)

  • UDP 6905 (For Soap to Stream Process Manager communication)

  • UDP 6894 (For Soap to Stream Service communication)

  • UDP 6898 (For Soap to Mgmt Daemon communication)

  • UDP 6895 (For Inventory to Inventory communication)

  • UDP 6903 (For Notifier to Notifier Communication)

Note: DisableTaskOffload is still required.

Related:

Isilon: If the Smartconnect Service IP (SSIP) is assigned to an aggregate interface, the IP address may go missing under certain conditions or move to another node if one of the laggports is shutdown.

Article Number: 519890 Article Version: 13 Article Type: Break Fix



Isilon,Isilon OneFS 8.0.0.6,Isilon OneFS 8.0.1.2,Isilon OneFS 8.1.0.2

The Smartconnect SSIP or network connectivity could be disrupted in a node if link aggregation interface in LACP mode is configured, and one of the port members in the lagg interface stops participating from the LACP aggregation.

Issue happens when a node is configured with any of the link aggregation interfaces:

10gige-agg-1

ext-agg-1

And one of its port members is not participating into the lagg interface:

lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>

ether 00:07:43:09:3c:77

inet6 fe80::207:43ff:fe09:3c77%lagg0 prefixlen 64 scopeid 0x8 zone 1

inet 10.25.58.xx netmask 0xffffff00 broadcast 10.25.58.xxx zone 1

nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

media: Ethernet autoselect

status: active

laggproto lacp lagghash l2,l3,l4

laggport: cxgb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

>> laggport: cxgb1 flags=0<>

This will cause OneFS to internally set the link aggregation interface to ‘No Carrier’ status, due to a bug in network manager software (Flexnet):

# isi network interface list

LNN Name Status Owners IP Addresses

————————————————————————–

1 10gige-1 No Carrier – –

1 10gige-2 Up – –

1 10gige-agg-1 No Carrier groupnet0.subnet10g.pool10g 10.25.58.46

Possible failures causing the issue:

  1. Failed switch port
  2. Incorrect LACP configuration at switch port
  3. Bad cable/SFP, or other physical issue
  4. A connected switch to a port was failed, or rebooted
  5. BXE driver bug reporting not full duplex in a port state (KB511208)

Failures 1 to 4, are external to the cluster, and issue should go away as soon as these gets fixed. Failure 5 could be a persistent failure induced by a known OneFS-BXE bug(KB 511208).

  1. If node is lowest node id in pool, and Smartconnect SSIP is configured there, then:
    1. If failure 1,2, or 3 happen, then the SSIP will be moved to next lowest node id that is clear from any failure
    2. If failure 4 is present, then the SSIP will not be available in any node, and DU is expected until workaround is implemented, patch is installed, or switch is fixed or gets available again after a reboot.
    3. If failure 5 is present:
      1. If only one port is failed, then SSIP will move to next available lowest node id not affected by the issue
      2. [DU] If all nodes in a cluster are BXE nodes, and all are affected by the bug, the SSIP will not be available, expect DU, until workaround or patch is applied.
  2. If the link aggregation in LACP mode is configured in a subnet-pool where its defined gateway is the default route in the node, then:
  1. If issue happens when node is running and default route is already set, then the default route will be continue configured and available, connectivity to already connected clients should continue working.
  2. [DU] If node is rebooted with any of the persistent failures, after it gets back up after the reboot, the default router will not be available, causing DU until external issue is fixed, workaround applied, or patch installed.

If during upgrade to 8.0.0.6 or 8.1.0.2 any of the failures is present, then after the rolling reboot a DU is expected due to case described in cause A->c->ii, or cause B->b. A check must be made prior to the upgrade to evaluate you are clear from any of the described failures.



Workaround


Workaround to immediately restore link aggregation interface if only one member port is persistently down (Failed switch, failed cable/SFP, BXE bug, or other persistent issue)

Step 1:

Identify failed member port on link aggregation interface:

# ifconfig

lagg1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>

ether 00:0e:1e:58:20:70

inet6 fe80::20e:1eff:fe58:2070%lagg1 prefixlen 64 scopeid 0x8 zone 1

inet 172.16.240.xxx netmask 0xffff0000 broadcast 172.16.255.xxx zone 1

nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

media: Ethernet autoselect

status: active

laggproto lacp lagghash l2,l3,l4

>> laggport: bxe1 flags=0<>

laggport: bxe0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>


Step 2:

Manually remove port member with command:

ifconfig lagg1 -laggport bxe1

Network should be recovered in 10-20 seconds, after executing the command.

This change will be lost after a reboot.

After the external failure in a port has been identified and fixed, and port is again available, reconfigure

port back into link aggregation configuration with command:

ifconfig lagg1 laggport bxe1

A permanent fix will be available in the following OneFS maintenance releases once they become available:

  • OneFS 8.0.0.7
  • OneFS 8.1.0.4

Roll-Up patch is now available for:

8.0.0.6 (bug 226984) – patch-226984

8.1.0.2 (bug 226323) – patch-226323

NOTE: This issue affects the following OneFS versions ONLY:

  • OneFS 8.0.0.6
  • OneFS 8.0.1.2
  • OneFS 8.1.0.2
  • OneFS 8.1.1.1

Related:

  • No Related Posts

High Availability Traffic/Heartbeats are not seen on NetScaler Tagged Channel Network Interfaces

If you are optimizing traffic on a multi tenant server network with numerous VLANs, while isolating management traffic you might encounter a problem where heartbeat packets are not visible on all interfaces.

This is common on NetScaler high availability pairs using Link Aggregation on ether-channel switch ports (in this example Cisco Switches). The following demonstrates this issue:

> show node1) Node ID: 0IP: 10.187.125.21 (ns01)Node State: UPMaster State: PrimaryFail-Safe Mode: OFFINC State: DISABLEDSync State: ENABLEDPropagation: ENABLEDEnabled Interfaces : 0/1 LA/1Disabled Interfaces : 1/8 1/7 1/6 1/4 1/3 1/2 0/2HA MON ON Interfaces : 1/8 1/7 1/6 1/4 1/3 1/2 0/1 0/2 LA/1Interfaces on which heartbeats are not seen : LA/1Interfaces causing Partial Failure: NoneSSL Card Status: UPHello Interval: 200 msecsDead Interval: 3 secsNode in this Master State for: 0:21:42:50 (days:hrs:min:sec)2) Node ID: 1IP: 10.187.125.22Node State: UPMaster State: SecondaryFail-Safe Mode: OFFINC State: DISABLEDSync State: SUCCESSPropagation: ENABLEDEnabled Interfaces : 0/1 LA/1Disabled Interfaces : 1/8 1/7 1/6 1/4 1/3 1/2 0/2HA MON ON Interfaces : 1/8 1/7 1/6 1/4 1/3 1/2 0/1 0/2 LA/1Interfaces on which heartbeats are not seen : LA/1Interfaces causing Partial Failure: NoneSSL Card Status: UPLocal node information:Critical Interfaces: 0/1 LA/1Done

In most situations the heartbeat packets will stop by vLAN tagging mismatch on the switch. Review the following article for additional information: CTX109843 – How to Configure a NetScaler Appliance Using Link Aggregation to Connect Pairs of Interfaces to the Cisco Switches​

Related:

  • No Related Posts

Can HTTPS be monitored with Network Monitor 15.1?

I need a solution

Hi,

I have 2 detection servers Network Monitor 15.1 in 2 different core switches but checking the incidents HTTPS are generated without having the HTTPS protocol enabled, someone can explain me why it generates this type of incidents or within this new version it already detects the encrypted traffic HTTPS Network Monitor natively.

Thanks and regards.

0

Related:

  • No Related Posts

Introducing Dell EMC Ready Solutions for Microsoft WSSD: QuickStart Configurations for ROBO and Edge Environments

The QuickStart Configurations for ROBO and Edge environments are four node, two switch highly available, end-to-end HCI configurations that simplifies the design, ordering, deployment and support.

High_Level_Overview.png

Key Features:

  • Pre-configured and sized Small, Medium and Large solution templates to simplify ordering/sizing.
  • Available in two platform options – R640 All-Flash & R740XD Hybrid S2D Ready Nodes.
  • Includes two fully redundant, half-width Dell EMC S4112-ON switches in 1U form factor.
  • Includes optional ProDeploy and mandatory ProSupport services providing the same solution level support as S2D Ready Nodes.
  • Detailed QuickStart deployment guide to de-risk deployment and accelerate time to bring up.
  • Additionally, the QuickStart configurations also includes rack, PDUs and blanking panels right sized for the solution. These Data Center Infrastructure (DCI) solutions help our customers save time and resources, reduce effort and improve overall experience.



The network fabric for this QuickStart configurations implements a non-converged topology for the in-band/out-of-band Management and Storage networks. The RDMA capable Qlogic FastLinQ 41262 adapters are used in support of the storage 25Gbe SFP28 traffic while the rNDC provides the 10 GbE SFP+ bandwidth used for the host management and VM network traffic.



Non-Converged_Option-2 (1).png

This optimized network architecture of the switch fabric ensures redundancy of both the storage and management networks. In this design, the storage networks are isolated to each respective switch (Storage 1 to TOR 1; Storage 2 to TOR 2). As storage traffic (typically) never traverses the customer LAN networks, the VLT bandwidth has been optimized to fully support redundancy of the management network up to customer data center uplink.



For sample switch configurations, see https://community.emc.com/docs/DOC-70310.



The operations guidance for Dell EMC Ready Solutions for Microsoft WSSD provides the necessary instructions to perform the day 0 management and monitoring on-boarding tasks and instructions on performing life cycle management of the Storage Spaces Direct cluster.

Related:

  • No Related Posts

Large Dataset Design – Environmental and Logistical Considerations

In this article, we turn our attention to some of the environmental and logistical aspects of cluster installation and management.



In addition to available rack space and physical proximity of nodes, provision needs to be make for adequate power and cooling as the cluster expands. New generations of nodes typically deliver and increased storage density, which often magnifies the power draw and cooling requirements per rack unit.



The larger the cluster, the more disruptive downtime and reboots can be. To this end, the recommendation is for a large cluster’s power supply to be fully redundant and backed up with a battery UPS and/or power generator. In the worst instance, if a cluster does loose power, the nodes are protected internally by file system journals which preserve any in-flight uncommitted writes. However, the time to restore power and reboot a large cluster can be considerable.



Like most data center equipment, the cooling fans in Isilon nodes and switches pull air from the front to back of the chassis. To complement this, most data centers use a hot isle/cold isle rack configuration, where cool, low humidity air is supplied in the aisle at the front of each rack or cabinet either at the floor or ceiling level, and warm exhaust air is returned at ceiling level in the aisle to the rear of each rack.



Given the high power draw and heat density of cluster hardware, some data centers are limited in the number of nodes each rack can support. For partially filled racks, the use of blank panels to cover the front and rear of any unfilled rack units can help to efficiently direct airflow through the equipment.



The use of intelligent power distribution units (PDUs) within each rack can facilitate the remote power cycling of nodes, if desired.



For Gen6 hardware, where chassis depth can be a limiting factor, 2RU horizontally mounted PDUs within the rack can be used in place of vertical PDUs. If front-mounted, partial depth Ethernet switches are deployed, horizontal PDUs can be installed in the rear of the rack directly behind the switches to maximize available rack capacity.



Cabling and Networking

With copper (CX4) Infiniband cables the maximum cable length is limited to 10 meters. After factoring in for dressing the cables to maintain some level of organization and proximity within the racks and cable trays, all the racks with Isilon nodes need to be in close physical proximity to each other –either in the same rack row or close by in an adjacent row.



Support for multi-mode fiber (SC) for Infiniband and for Ethernet extends the cable length limitation to 150 meters. This allows nodes to be housed on separate floors or on the far side of a floor in a data center if necessary. While solving the floor space problem, this has the potential to introduce new administrative and management issues. The table below shows the various optical and copper backend network cabling options available.

Cable Type

Model

Connector

Length

Ethernet Cluster

Infiniband Cluster

Copper

851-0253

QSFP+

1m

P

Copper

851-0254

QSFP+

3m

P

Copper

851-0255

QSFP+

5m

P

Optical

851-0224

MPO

10m

P

P

Optical

851-0225

MPO

30m

P

P

Optical

851-0226

MPO

50m

P

P

Optical

851-0227

MPO

100m

P

P

Optical

851-0228

MPO

150m

P

P

With large clusters, especially when the nodes may not be racked in a contiguous manner, having all the nodes and switches connected to serial console concentrators and remote power controllers is highly advised. However, to perform any physical administration or break/fix activity on nodes you must know where the equipment is located and have administrative resources available to access and service all locations.



As such, the following best practices are highly recommended:



  • Develop and update thorough physical architectural documentation.
  • Implement an intuitive cable coloring standard.
  • Be fastidious and consistent about cable labeling.
  • Use the appropriate length of cable for the run and create a neat 12” loop of any excess cable, secured with Velcro.
  • Observe appropriate cable bend ratios, particularly with fiber cables.
  • Dress cables and maintain a disciplined cable management ethos.
  • Keep a detailed cluster hardware maintenance log.
  • Where appropriate, maintain a ‘mailbox’ space for cable management.



Disciplined cable management and labeling for ease of identification is particularly important in larger Gen6 clusters, where density of cabling is high. Each Gen6 chassis can require up to twenty eight cables, as shown in the table below:

Cabling Component

Medium

Cable Quantity per Gen6 Chassis

Back end network

10 or 40 Gb Ethernet or QDR Infiniband

8

Front end network

10 or 40Gb Ethernet

8

Management Interface

1Gb Ethernet

4

Serial Console

DB9 RS 232

4

Power cord

110V or 220V AC power

4

Total

28

The recommendation for cabling a Gen6 chassis is as follows:

  • Split cabling in the middle of the chassis, between nodes 2 and 3.
  • Route Ethernet and Infiniband cables towards lower side of the chassis.
  • Connect power cords for nodes 1 and 3 to PDU A and power cords for nodes 2 and 4 to PDU B.
  • Bundle network cables with the AC power cords for ease of management.
  • Leave enough cable slack for servicing each individual node’s FRUs.



hardware_6.png



Consistent and meticulous cable labeling and management is particularly important in large clusters. Gen6 chassis that employ both front and back end Ethernet networks can include up to twenty Ethernet connections per 4RU chassis.



hardware_7.png



In each node’s compute module, there are two PCI slots for the Ethernet cards (NICs). Viewed from the rear of the chassis, in each node the right hand slot (HBA Slot 0) houses the NIC for the front end network, and the left hand slot (HBA Slot 1) the NIC for the front end network. In addition to this, there is a separate built-in 1Gb Ethernet port on each node for cluster management traffic.



While there is no requirement that node 1 aligns with port 1 on each of the backend switches, it can certainly make cluster and switch management and troubleshooting considerably simpler. Even if exact port alignment is not possible, with large clusters, ensure that the cables are clearly labeled and connected to similar port regions on the backend switches.



Servicing and FRU Parts Replacement

Isilon nodes and the drives they contain have identifying LED lights to indicate when a component has failed and to allow proactive identification of resources. The ‘isi led’ CLI command can be used to proactive illuminate specific node and drive indicator lights to aid in identification.

Drive repair times depend on a variety of factors:

  • OneFS release (determines Job Engine version and how efficiently it operates)
  • System hardware (determines drive types, amount of CPU and RAM, etc)
  • File system: Amount of data, data composition (lots of small vs large files), protection, tunables, etc.
  • Load on the cluster during the drive failure

The best way to estimate future FlexProtect run-time is to use old repair run-times as a guide, if available.

Gen 6 drives have a bay-grid nomenclature similar to that of HD400 where A-E indicates each of the sleds and 0-6 would point to the drive position in the sled. The drive closest to the front is 0, whereas the drive closest to the back is 2/3/5, depending on the drive sled type.



For Gen5 and earlier hardware running OneFS 8.0 or prior, the isi_ovt_check CLI tool can be run on a node to verify the correct operation of the hardware.

Hardware Refresh

When it comes to updating and refreshing hardware in a large cluster, swapping nodes can be a lengthy process of somewhat unpredictable duration. Data has to be evacuated from each old node during the Smartfail process prior to its removal, and restriped and balanced across the new hardware’s drives. During this time there will also be potentially impactful group changes as new nodes are added and the old ones removed. An alternative and efficient approach can often be the swapping out of drives into new chassis. In addition to being considerable faster, the drive swapping process focuses the disruption on a single whole cluster down event. Estimating the time to complete a drive swap, or ‘disk tango’ process is simpler and more accurate and can typically be completed in a single maintenance window.



For Gen 5 and earlier 4RU nodes, a drive tango can be a complex procedure due to the large number of drives per node (36 or 60 drives).



With Gen 6 chassis, the available hardware ‘tango’ options are expanded and simplified. Given the modular design of these platforms, the compute and chassis tango strategies typically replace the disk tango:



Replacement Strategy

Component

Gen 4/5

Gen 6

Description

Disk tango

Drives / drive sleds

P

P

Swapping out data drives or drive sleds

Compute tango

Gen6 Compute modules

P

Rather than swapping out the twenty drive sleds, it’s usually cleaner to exchange the four compute modules

Chassis tango

Gen6 chassis

P

Typically only required if there’s an issue with the chassis mid-plane.



Note that any of the above ‘tango’ procedures should only be executed under the recommendation and supervision of Isilon support.

Related:

  • No Related Posts

Large Dataset Design – Hardware Layout & Installation Considerations

In this next article in the series, we’ll take a look at some of the significant aspects of large cluster physical design and hardware installation.



Most Isilon nodes utilize a 35 inch depth chassis and will fit in a standard depth data center cabinet. However, high capacity models such as the HD400 and A2000 have 40 inch depth chassis and require extended depth cabinets such as the APC 3350 or Dell EMC Titan-HD rack.



hardware_1.png

Additional room must be provided for opening the FRU service trays at the rear of the nodes and, in Gen6 hardware, the disk sleds at the front of the chassis. Isilon nodes are either 2RU or 4RU in height (with the exception of the 1RU diskless accelerator and backup accelerator nodes).



Note that the Isilon A2000 nodes can also be purchased as a 7.2PB turnkey pre-racked solution.



Weight is another critical factor to keep in mind. Individual 4RU chassis can weigh up to around 300lbs each, and the floor tile capacity for each individual cabinet or rack must be kept in mind. For the large archive nodes styles (HD400 and A2000), the considerable node weight may prevent racks from being fully populated with Isilon equipment. If the cluster uses a variety of node types, installing the larger, heavier nodes at the bottom of each rack and the lighter chassis at the top can help distribute weight evenly across the cluster racks’ floor tiles.



There are no lift handles on a Gen6 chassis. However, the drive sleds can be removed to provide handling points if no lift is available. With all the drive sleds removed, but leaving the rear compute modules inserted, the chassis weight drops to a more manageable 115lbs. It is strongly recommended to use a lift for installation of Gen6 chassis and the 4RU earlier generation nodes.

Ensure that smaller Ethernet switches are drawing cool air from the front of the rack, not from inside the cabinet, as they are shorter than the IB switches. This can be achieved either with switch placement or by using rack shelving.Cluster backend switches ship with the appropriate rails (or tray) for proper installation of the switch in the rack. These rail kits are adjustable to fit NEMA front rail to rear rail spacing ranging from 22 in to 34 in.

Note that the Celestica Ethernet switch rails are designed to overhang the rear NEMA rails to align the switch with the Generation 6 chassis at the rear of the rack. These require a minimum clearance of 36 in from the front NEMA rail to the rear of the rack, in order to ensure that the rack door can be closed.

Consider the following large cluster topology, for example:



hardware_2.png

This contiguous eleven rack architecture is designed to scale up to ninety six 4RU nodes as the environment grows, while keeping cable management simple and taking the considerable weight of the Infiniband cables off the connectors as much as possible.



Best practices include:



  • Pre-allocate and reserve adjacent racks in the same isle to fully accommodate the anticipated future cluster expansion
  • Reserve an empty 4RU ‘mailbox’ slot above the center of each rack for pass-through cable management.
  • Dedicate the central rack in the group for the back-end and front-end switches – in this case rack F (image below).



Below, the two top Ethernet switches are for front-end connectivity and the lower two Infiniband switches handle the cluster’s redundant back-end connections.



hardware_3.png



Image showing cluster Front and Back-end Switches (Rack F Above)



The 4RU “mailbox” space is utilized for cable pass-through between node racks and the central switch rack. This allows cabling runs to be kept as short and straight as possible.



hardware_4.png



Rear of Rack View Showing Mailbox Space and Backend Network Cabling (Rack E Above)



Excess cabling can be neatly stored in 12” service coils on a cable tray above the rack, if available, or at the side of the rack as illustrated below.



hardware_5.png

Rack Side View Detailing Excess Cable Coils (Rack E Above)



Successful large cluster infrastructures depend heavily on the proficiency of the installer and their optimizations for maintenance and future expansion.



Note that for Hadoop workloads, Isilon is compatible with the rack awareness feature of HDFS to provide balancing in the placement of data. Rack locality keeps the data flow internal to the rack.

Related: