The last couple of articles generated several questions around the location, identification and management of hard drives in the new Gen 6 platform.
As a quick recap, each Gen 6 chassis comprises four nodes, each of which contain a compute node module and a set of 5 drive sleds that all plug into the chassis midplane. The chassis design focuses on a modular architecture help to maximize density, simplify serviceability and for avoid single points of failure by a mantra of redundancy everywhere.
The compute module houses all of the node hardware components with the exception of the data drives. These include a single processor, RAM, SAS controller, SAS Expander, battery backup, vault drive, up to two SSD drives, front-end ethernet NIC, and a back-end network card.
The four nodes within each chassis are grouped into compute node pairs. Each pair shares power from their power supplies – one power supply per compute node. These compute node pairs use Intel’s Non-Transparent Bridge (NTB) technology to enable high speed connectivity between nodes via the PCI interface. The Non-Transparent Bridge connection between nodes is used to mirror the OneFS journal to the partner node, and vice versa.
The Gen 6 data drives are mounted in drive sleds, up to five sleds per node.
There are three different sled types found in the Gen 6 platform, depending on the chassis type. For 3.5 inch SATA drives, there is either a three drive ‘short sled’ (most prevalent), or a four drive ‘long sled’ (used in the A2000). The drives in these are placed longitudinally.
For 2.5 inch SAS drives, there’s a short sled which houses up to six drives, lying transverse in the tray (used in the F800 and H600).
The Gen6 chassis deploys a bay-grid numbering system. As such, the nodes are numbered 1 through 4 from left to right, looking at the front of each chassis where the drive sleds are housed. Each node’s five drive sled are arranged vertically and referenced as A through E from top to bottom. For example, in the following diagram, sled E from node 1 is shown as removed:
Within each sled, the drives are numbered from sequentially from front to back of the tray. The drive closest to the front is always number 0, whereas the drive closest to the back is either 2, 3 or 5, according to the drive sled type.
For example, consider an H500 chassis with the following data drive specs:
- Drive Type: 3.5” SAS/SATA
- Sleds per Node: 5
- Drives per Sled: 3
- Total Drive Count: 60
- Drive Capacity: 4TB
Here’s an image of the drive and sled arrangement in node 1 of this H500:
The drive at the back of the bottom sled in the left most stack of sleds would be node 1, sled E, drive 2 – or 1E2.
The location of a drive bay can be seen by viewing the drives in a node via ‘isi devices drive list’:
This is in contrast with previous versions of Isilon nodes, where ‘location’ is a single digit, since there’s no concept of a drive sled.
Similarly, consider an H600 with the following data drive makeup:
- Drive Type: 2.5” SAS
- Sleds per Node: 5
- Drives per Sled: 6
- Total Drive Count: 120
- Capacity: 600GB
The drive highlighted in red would be node 1, sled C, drive 3 – or node 1C3.
The front left of each drive sled presents a display panel. This panel contains 3 LEDs and a push button.
The top ‘blue’ LED indicates power and drive activity for the sled. Under this is a yellow warning LED that reports a sled fault. Below this is a white ‘not safe to remove’ LED.
When illuminated, this ‘white hand’ LED indicates drive activity. Do NOT remove any sled from the chassis until its white LED is extinguished, as doing so may cause data loss.
Each chassis is also equipped with a front panel display. This front panel display is hinged, so it can swing clear of the drive sleds behind it. It’s attached to the midplane by a ribbon cable that runs down the length of the chassis.
The front panel display contains an LCD display panel which provides status on the cluster alerts, etc. There are also four numbered power/fault indicator buttons, one for each node. These will be green for normal operation, but will show amber if there is a fault on that node. Additionally, there is a five-button illuminated touch keypad for controlling the display panel functionality.
Under normal operating conditions the blue power LED and while ‘hand’ LED will be illuminated. However, if a drive faults or fails for any reason (or if a cluster admin issues a proactive drive smartfail), ,in addition to amber warning lights and LCD notifications for the particular node affected, OneFS will illuminate the amber fault light on the appropriate sled’s display panel.
Each drive bay in a sled also has amber fault indicator LEDs (in addition to slot labeling) to make it easy to identify the appropriate drive for servicing:
To procedure to replace a failed drive is a follows:
- When a drive is faulted and ready for replacement, OneFS will illuminate the Front Panel Fault LED associated with that node, the Drive Sled Fault LED, and the Fault LED associated with that drive.
- Identify the front panel display with the Fault LED on, remove its bezel and then locate the drive sled with the fault LED illuminated.
- Push the ‘request for service’ button on the sled display panel to notify OneFS that the sled is about to be removed.
- The white hand ‘not safe to remove’ LED will immediately blink to acknowledge the button press.
- OneFS then prepares itself for losing up to 6 working drives (on 2.5” sled) and, when ready, the white hand LED is switched off.
- When the white hand LED has turned off, it is now safe to pull the sled.
- Insert the new drive into the empty slot in the sled.
As we will see in the next article, OneFS 8.1 introduces the Automatic Replacement Recognition feature for simplified drive replacement.