|Article Number: 514517||Article Version: 12||Article Type: Break Fix|
Dell EMC Unity All Flash,Dell EMC Unity 300F,Dell EMC Unity 350F,Dell EMC Unity 400F,Dell EMC Unity 450F,Dell EMC Unity 500F,Dell EMC Unity 550F,Dell EMC Unity 600F,Dell EMC Unity 650F
This issue occurs only with Dynamic Pools that are built on System drives (DPE drive slots 0, 1, 2, & 3), with Unity OE versions 4.2.0 & 4.2.1, where a system drive has reported itself as End Of Life (EOL), and where the array does not have an available Spare drive (i.e., unused drive) to spare into the Dynamic Pool when the EOL event occurs.
The Data Unavailable (DU) condition will only occur if one of the SPs is rebooted after the EOL drive has been physically replaced with a new drive, and no other drive was previously available to spare into the Dynamic Pool as a replacement.
If after a single SP reboot, you encounter a DU situation where resources are unavailable because a Pool is reported offline, then an immediate reboot of the alternate SP that has not yet been rebooted, will resolve the DU and restore the Pool to an online status, with no further risk of DU.
Multiple Warning Alerts, Degraded System State for EOL Drive:
Warning 14:60515 System unity550f has experienced one or more problems that have left it in a degraded state.
Warning 14:6027c DPE Disk 1 is reaching the end of its service life and needs to be replaced.
Warning 14:60340 Storage pool Dynamic1 has 1 drive(s) predicted to exceed end-of-life thresholds within 0 day(s)…
Unisphere or UEMCLI may display the following for the EOL System Drive:
The system has started an automatic copy of data from this drive that is wearing out to a spare drive.
Note: The above message can be misleading. If the array has no Spare drives available, the message actually means that the user extents on the system drive are being copied off onto Spare Extents within the Dynamic Pool. It does not mean that the DU scenario will be avoided.
CRITICAL System State and Pool Offline if one SP rebooted:
Critical 14:6032b Storage pool Dynamic1 is offline. The pool is offline. Contact your service provider.
Critical 14:60514 System unity550f has experienced one or more problems that have had a critical impact
There is an issue in 4.2.0 & 4.2.1 with Dynamic Pools built on System drives, that when a System drive reports itself as going EOL, and the system does not have any spare drives available to automatically “replace” the affected drive in the Pool, and the affected System drive is later physically replaced with a new drive, there is the possibility of a DU event when a single SP is rebooted. The issue is that the Pool incorrectly retains the EOL flag on the drive that was used to physically replace the EOL System drive, even though the GUI and UEMCLI reports the system as operating normally. The single SP reboot will cause the associated Dynamic Pool to go offline, resulting in loss of access to any LUN or File System objects built on that Pool.
The Resolution section below describes how to recover from an existing DU situation (reboot the alternate SP), or to prevent the DU situation from occurring and removing the DU condition through a series of preventive steps.
Conditions that must exist before a DU Pool Offline event would occur:
1. System drive is reporting itself as End of Life (EOL)
2. The system is running a version of 4.2.x Unity OE
3. There are no eligible Spare drives on the system to automatically replace the EOL drive in the Dynamic Pool
4. The EOL drive is subsequently replaced with a new drive by user action
5. A single SP is rebooted, leading to Pool Offline and DU situation
Scenario #1: Pool Offline, DU Occurring (Conditions 1-5 have taken place)
If all the conditions, as outlined above, have taken place and you are experiencing a loss of access to objects built from Dynamic Pools (i.e., the Pool is offline), you can recover from the DU situation by immediately rebooting the alternate SP that has not already been rebooted. This will restore access and there is no longer any further risk of DU, the issue is permanently resolved.
Scenario #1 Comments:
If you have a system drive matching an EOL condition, have replaced the EOL drive, and are now experiencing a DU situation after rebooting an SP, you will see Alerts or a Pool message that “The pool is offline”. You can restore access and place the affected pool back online by immediately rebooting the opposite SP, that is, the SP that has not already been rebooted. This will remove the lingering EOL attribute from the Dynamic Pool, and allow the pool to come back online. Unfortunately, if the DU situation occurs during the course of an NDU upgrade, when the first SP reboots for the upgrade, the pool will go offline, but will return to online status after the 2nd SP has rebooted for the upgrade activity.
Scenario #1 Customer Resolution:
1. Reboot the alternate SP that has not yet been rebooted, or in the case of an NDU (Non-disruptive upgrade), allow the NDU to complete.
Scenario #2: Preventing DU (Only conditions 1-4 have taken place)
If only conditions 1-4 exist, DU has not yet occurred since a single SP reboot has not taken place. Use the following steps to help prevent a possible DU situation.
Scenario #2 Customer Resolution:
1. If you have a spare drive that is of the same Type (e.g., SAS Flash 2) and Size (size can be larger but not smaller), and you have an Open drive slot on the array, insert the Spare drive and it will automatically spare into the Pool and eliminate the potential DU issue.
2. After 10-15 minutes, a system Alert and Status should report Normal (System xxx is operating normally).
3. Under Pool properties > Drives (pool associated with the EOL system drive), you should see that the Spare drive is now in use and has replaced the EOL drive.
4. At this point, no further action is required. However, if you cannot adequately verify that the Spare drive has replaced the EOL system drive in the Pool, please contact your Service Provider for assistance and reference this Article number.
5. If you do not have a Spare drive that can be added to any slot in the array, contact your Service Provider, reference this KB article, and do not reboot any single SP. Your provider will be able to perform non-disruptive steps that will clear the system from any potential DU related to the issue described in this article.
This issue has been fixed in 184.108.40.2062077968 and greater, although as noted above, if conditions 1-4 already exist and have not been corrected, then during the NDU upgrade to 4.3, after the first SP is rebooted for the upgrade, the Dynamic Pool will go offline, resulting in DU. However, after the second SP has been upgraded and rebooted, the Pool will be brought back online and the DU event will cease.
See the latest Dell EMC Unity Family Release Notes for more information.
Please contact Dell EMC Technical Support or your Authorized Service Representative, and quote this Knowledgebase article ID.