OneFS MediaScan

As we’ve seen previously, OneFS utilizes file system scans to perform such tasks as detecting and repairing drive errors, reclaiming freed blocks, etc. These scans are typically complex sequences of operations which may take many hours to run, so they are implemented via syscalls and coordinated by the Job Engine. These jobs are generally intended to run as minimally disruptive background tasks in the cluster, using spare or reserved capacity.

The file system maintenance jobs which are critical to the function of OneFS are:

FS Maintenance Job

Description

AutoBalance

Restores node and drive free space balance

Collect

Reclaims leaked blocks

FlexProtect

Replaces the traditional RAID rebuild process

MediaScan

Scrub disks for media-level errors

MultiScan

Run AutoBalance and Collect jobs concurrently

MediaScan’s role within the file system protection framework is to periodically check for and resolve drive bit errors across the cluster. This proactive data integrity approach helps guard against a phenomenon known as ‘bit rot’, and the resulting specter of hardware induced silent data corruption.

The MediaScan job reads all of OneFS’ allocated blocks in order to trigger any latent drive sector errors in a process known as ‘disk scrubbing’. Drive sectors errors may occur due physical effects which, over time, could negatively affect the protection of the file system. Periodic disk scrubbing helps ensure that sector errors do not accumulate and lead to data integrity issues.

Sector errors are a relatively common drive fault. They are sometimes referred to as ‘ECCs’ since drives have internal error correcting codes associated with sectors. A failure of these codes to correct the contents of the sector generates an error on a read of the sector.

ECCs have a wide variety of causes. There may be a permanent problem such as physical damage to platter, or a more transient problem such as the head not being located properly when the sector was read. For transient problems, the drive has the ability to retry automatically. However, such retries can be time consuming and prevent further processing.

OneFS typically has the redundancy available to overwrite the bad sector with the proper contents. This is called Dynamic Sector Repair (DSR). It is preferable for the file system to perform DSR than to wait for the drive to retry and possibly disrupt other operations. When supported by the particular drive model, a retry time threshold is also set so that disruption is minimized and the file system can attempt to use its redundancy.

In addition, MediaScan maintains a list of sectors to avoid after an error has been detected. Sectors are added to the list upon the first error. Subsequent I/Os consult this list and, if a match is found, immediately return an error without actually sending the request to the drive, minimizing further issues.

If the file system can successfully write over a sector, it is removed from the list. The assumption is that the drive will reallocate the sector on write. If the file system can’t reconstruct the block, it may be necessary to retry the I/O since there is no other way to access the data. The kernel’s ECC list must be cleared. This is done at the end of the MediaScan job run, but occasionally must also be done manually to access a particular block.

The drive’s own error-correction mechanism can handle some bit rot. When it fails, the error is reported to the MediaScan job. In order for the file system to repair the sector, the owner must be located. The owning structure in the file system has the redundancy that can be used to write over the bad sector, for example an alternate mirror of a block.

Most of the logic in MediaScan handles searching for the owner of the bad sector; the process can be very different depending on the type of structure, but is usually quite expensive. As such, it is often referred to as the ‘haystack’ search, since nearly every inode may be inspected to find the owner. MediaScan works by directly accessing the underlying cylinder groups and disk blocks via a linear drive scan and has more job phases than most job engine jobs for two main reasons:

  • First, significant effort is made to avoid the expense of the haystack search.
  • Second, every effort is made to try all means possible before alerting the administrator.

Here are the eight phases of MediaScan:

Phase #

Phase Name

Description

1

Drive Scan

Scans each drive using the ifs_find_ecc() system call, which issues I/O for all allocated blocks and inodes.

2

Random Drive Scan

Find additional “marginal” ECCs that would not have been detected by the previous phase.

3

Inode Scan

Inode ECCs can be located more quickly from the LIN tree, so this phase scans the LIN tree to determine the (LIN, snapshot ID) referencing any inode ECCs.

4

Inode Repair

Repairs inode ECCs with known (LIN, snapshot ID) owners, plus any LIN tree block ECCs where the owner is the LIN tree itself.

5

Inode Verify

Verifies that any ECCs not fixed in the previous phase still exist. First, it checks whether the block has been freed. Then it clears the ECC list and retries the I/O to verify that the sector is still failing.

6

Block Repair

Drives are scanned and compared against the list of ECCs. When ECCs are found, the (LIN, snapshot ID) is returned and the restripe repairs ECCs in those files. This phase is often referred to as the “haystack search”.

7

Block Verify

Once all file system repair attempts have completed, ECCs are again verified by clearing the ECC list and reissuing I/O.

8

Alert

Any remaining ECCs after repair and verify represent a danger of data loss. This phase logs the errors at the syslog ERR level.

MediaScan falls within the job engine’s restriping exclusion set, and is run as a low-impact, low-priority background process. It is executed automatically by default at 12am on the first Saturday of each month, although this can be reconfigured if desired.

In addition to scheduled job execution, MediaScan can also be initiated on demand. The following CLI syntax will kick off a manual job run:



# isi job jobs start mediascan

Started job [251]

# isi job jobs list

ID Type State Impact Pri Phase Running Time

——————————————————–

251 MediaScan Running Low 8 1/8 1s

——————————————————–

Total: 1

The MediaScan job’s progress can be tracked via a CLI command as follows:

# isi job jobs view 251

ID: 251

Type: MediaScan

State: Running

Impact: Low

Policy: LOW

Pri: 8

Phase: 1/8

Start Time: 2018-08-30T22:16:23

Running Time: 1m 30s

Participants: 1, 2, 3

Progress: Found 0 ECCs on 2 drives; last completed: 2:0; 0 errors

Waiting on job ID: –

Description:

A job’s resource usage can be traced from the CLI as such:

# isi job statistics view

Job ID: 251

Phase: 1

CPU Avg.: 0.21%

Memory Avg.

Virtual: 318.41M

Physical: 28.92M

I/O

Ops: 391

Bytes: 3.05M

Finally, upon completion, the MediaScan job report, detailing all eight stages, can be viewed by using the following CLI command with the job ID as the argument:

# isi job reports view 251

Related:

Re: [ScaleIO] Virtual disk bad block medium error is detected.

Hello all,

We got several [Storage Service Virtual disk bad block medium error is detected.: Virtual Disk X (Virtual Disk X)] logs in scaleIO server.

I believe this disk went bad and it caused unrecovered read error.

What I want to ask is, when I check [query_all_sds] and [query_sds], this sds seems to be healthy and all DISK seems normal.

Is this normal?

Also can you please let me know the details about monitoring disks in SDSs. Does MDM checks all disks in all SDSs frequently? or when SDC cannot connect to Disk, SDC informs MDM , then it changes mapping towards SDCs?

Thanks.

Related:

[ScaleIO] Virtual disk bad block medium error is detected.

Hello all,

We got several [Storage Service Virtual disk bad block medium error is detected.: Virtual Disk X (Virtual Disk X)] logs in scaleIO server.

I believe this disk went bad and it caused unrecovered read error.

What I want to ask is, when I check [query_all_sds] and [query_sds], this sds seems to be healthy and all DISK seems normal.

Is this normal?

Also can you please let me know the details about monitoring disks in SDSs. Does MDM checks all disks in all SDSs frequently? or when SDC cannot connect to Disk, SDC informs MDM , then it changes mapping towards SDCs?

Thanks.

Related:

Why the system health check reports lots of storage disks with grown defect errors?

Rule : DM042
Issue Detected : Potential bad sectors on disk
Severity : Medium
Components : disk7[spa2.encl5](HWID: 1294) (from catalog) – Disk grown defects 691 is more than 50
disk13[spa2.encl7](HWID: 1390) (from catalog) – Disk grown defects 476 is more than 50
disk5[spa1.encl11](HWID: 2105) (from catalog) – Disk grown defects 423 is more than 50
disk1[spa2.encl11](HWID: 1558) (from catalog) – Disk grown defects 351 is more than 50
disk14[spa2.encl4](HWID: 1256) (from catalog) – Disk grown defects 271 is more than 50
disk14[spa2.encl8](HWID: 1436) (from catalog) – Disk grown defects 254 is more than 50
disk2[spa2.encl11](HWID: 1559) (from catalog) – Disk grown defects 252 is more than 50
disk16[spa1.encl9](HWID: 2026) (from catalog) – Disk grown defects 235 is more than 50
disk17[spa1.encl4](HWID: 1802) (from catalog) – Disk grown defects 228 is more than 50
disk7[spa1.encl10](HWID: 2062) (from catalog) – Disk grown defects 224 is more than 50
disk12[spa1.encl3](HWID: 1752) (from catalog) – Disk grown defects 218 is more than 50
disk20[spa2.encl2](HWID: 1172) (from catalog) – Disk grown defects 215 is more than 50
disk24[spa1.encl12](HWID: 2169) (from catalog) – Disk grown defects 209 is more than 50
disk6[spa2.encl7](HWID: 1383) (from catalog) – Disk grown defects 189 is more than 50
disk4[spa1.encl7](HWID: 1924) (from catalog) – Disk grown defects 187 is more than 50
disk7[spa2.encl6](HWID: 1339) (from catalog) – Disk grown defects 185 is more than 50
disk7[spa2.encl2](HWID: 1159) (from catalog) – Disk grown defects 184 is more than 50
disk12[spa1.encl8](HWID: 1977) (from catalog) – Disk grown defects 180 is more than 50
disk18[spa1.encl12](HWID: 2163) (from catalog) – Disk grown defects 180 is more than 50
disk24[spa2.encl7](HWID: 1401) (from catalog) – Disk grown defects 177 is more than 50
disk8[spa2.encl11](HWID: 1565) (from catalog) – Disk grown defects 177 is more than 50
disk7[spa1.encl4](HWID: 1792) (from catalog) – Disk grown defects 171 is more than 50
disk20[spa2.encl7](HWID: 1397) (from catalog) – Disk grown defects 170 is more than 50
disk18[spa2.encl1](HWID: 1125) (from catalog) – Disk grown defects 166 is more than 50
disk20[spa2.encl3](HWID: 1217) (from catalog) – Disk grown defects 161 is more than 50
disk8[spa2.encl8](HWID: 1430) (from catalog) – Disk grown defects 161 is more than 50
disk8[spa2.encl5](HWID: 1295) (from catalog) – Disk grown defects 159 is more than 50
disk19[spa1.encl5](HWID: 1849) (from catalog) – Disk grown defects 155 is more than 50
disk20[spa1.encl4](HWID: 1805) (from catalog) – Disk grown defects 151 is more than 50
disk15[spa1.encl7](HWID: 1935) (from catalog) – Disk grown defects 149 is more than 50
disk22[spa1.encl3](HWID: 1762) (from catalog) – Disk grown defects 144 is more than 50
disk21[spa1.encl9](HWID: 2031) (from catalog) – Disk grown defects 144 is more than 50
disk24[spa1.encl5](HWID: 1854) (from catalog) – Disk grown defects 138 is more than 50
disk12[spa2.encl3](HWID: 1209) (from catalog) – Disk grown defects 138 is more than 50
disk8[spa1.encl1](HWID: 1658) (from catalog) – Disk grown defects 135 is more than 50
disk1[spa1.encl11](HWID: 2101) (from catalog) – Disk grown defects 135 is more than 50
disk21[spa1.encl4](HWID: 1806) (from catalog) – Disk grown defects 134 is more than 50
disk3[spa2.encl8](HWID: 1425) (from catalog) – Disk grown defects 134 is more than 50
disk21[spa2.encl4](HWID: 1263) (from catalog) – Disk grown defects 133 is more than 50
disk3[spa1.encl6](HWID: 1878) (from catalog) – Disk grown defects 131 is more than 50
disk12[spa2.encl12](HWID: 1614) (from catalog) – Disk grown defects 130 is more than 50
disk18[spa1.encl4](HWID: 1803) (from catalog) – Disk grown defects 128 is more than 50
disk23[spa2.encl9](HWID: 1490) (from catalog) – Disk grown defects 127 is more than 50
disk4[spa2.encl6](HWID: 1336) (from catalog) – Disk grown defects 126 is more than 50
disk19[spa2.encl11](HWID: 1576) (from catalog) – Disk grown defects 124 is more than 50
disk19[spa1.encl4](HWID: 1804) (from catalog) – Disk grown defects 123 is more than 50
disk14[spa1.encl10](HWID: 2069) (from catalog) – Disk grown defects 121 is more than 50
disk23[spa2.encl4](HWID: 1265) (from catalog) – Disk grown defects 120 is more than 50
disk5[spa1.encl3](HWID: 1745) (from catalog) – Disk grown defects 119 is more than 50
disk6[spa1.encl5](HWID: 1836) (from catalog) – Disk grown defects 118 is more than 50
disk3[spa1.encl4](HWID: 1788) (from catalog) – Disk grown defects 116 is more than 50
disk9[spa1.encl6](HWID: 1884) (from catalog) – Disk grown defects 114 is more than 50
disk1[spa1.encl12](HWID: 2146) (from catalog) – Disk grown defects 114 is more than 50
disk11[spa2.encl8](HWID: 1433) (from catalog) – Disk grown defects 112 is more than 50
disk7[spa1.encl12](HWID: 2152) (from catalog) – Disk grown defects 111 is more than 50
disk10[spa2.encl1](HWID: 1117) (from catalog) – Disk grown defects 110 is more than 50
disk15[spa2.encl9](HWID: 1482) (from catalog) – Disk grown defects 110 is more than 50
disk6[spa2.encl6](HWID: 1338) (from catalog) – Disk grown defects 109 is more than 50
disk3[spa1.encl3](HWID: 1743) (from catalog) – Disk grown defects 108 is more than 50
disk3[spa2.encl9](HWID: 1470) (from catalog) – Disk grown defects 108 is more than 50
disk3[spa1.encl11](HWID: 2103) (from catalog) – Disk grown defects 107 is more than 50
disk2[spa2.encl8](HWID: 1424) (from catalog) – Disk grown defects 107 is more than 50
disk10[spa1.encl1](HWID: 1660) (from catalog) – Disk grown defects 106 is more than 50
disk21[spa1.encl7](HWID: 1941) (from catalog) – Disk grown defects 106 is more than 50
disk2[spa2.encl6](HWID: 1334) (from catalog) – Disk grown defects 105 is more than 50
disk6[spa2.encl1](HWID: 1113) (from catalog) – Disk grown defects 102 is more than 50
disk24[spa1.encl7](HWID: 1944) (from catalog) – Disk grown defects 101 is more than 50
disk18[spa2.encl2](HWID: 1170) (from catalog) – Disk grown defects 101 is more than 50
disk16[spa2.encl9](HWID: 1483) (from catalog) – Disk grown defects 101 is more than 50
disk12[spa1.encl11](HWID: 2112) (from catalog) – Disk grown defects 99 is more than 50
disk4[spa2.encl2](HWID: 1156) (from catalog) – Disk grown defects 99 is more than 50
disk22[spa2.encl3](HWID: 1219) (from catalog) – Disk grown defects 99 is more than 50
disk6[spa2.encl8](HWID: 1428) (from catalog) – Disk grown defects 98 is more than 50
disk11[spa1.encl1](HWID: 1661) (from catalog) – Disk grown defects 97 is more than 50
disk14[spa2.encl12](HWID: 1616) (from catalog) – Disk grown defects 97 is more than 50
disk2[spa1.encl10](HWID: 2057) (from catalog) – Disk grown defects 97 is more than 50
disk5[spa2.encl7](HWID: 1382) (from catalog) – Disk grown defects 97 is more than 50
disk15[spa2.encl10](HWID: 1527) (from catalog) – Disk grown defects 96 is more than 50
disk22[spa2.encl10](HWID: 1534) (from catalog) – Disk grown defects 96 is more than 50
disk13[spa1.encl8](HWID: 1978) (from catalog) – Disk grown defects 95 is more than 50
disk20[spa1.encl11](HWID: 2120) (from catalog) – Disk grown defects 95 is more than 50
disk17[spa2.encl12](HWID: 1619) (from catalog) – Disk grown defects 95 is more than 50
disk10[spa2.encl8](HWID: 1432) (from catalog) – Disk grown defects 95 is more than 50
disk12[spa1.encl7](HWID: 1932) (from catalog) – Disk grown defects 94 is more than 50
disk7[spa2.encl7](HWID: 1384) (from catalog) – Disk grown defects 94 is more than 50
disk18[spa2.encl8](HWID: 1440) (from catalog) – Disk grown defects 94 is more than 50
disk16[spa2.encl10](HWID: 1528) (from catalog) – Disk grown defects 94 is more than 50
disk1[spa1.encl10](HWID: 2056) (from catalog) – Disk grown defects 93 is more than 50
disk5[spa1.encl6](HWID: 1880) (from catalog) – Disk grown defects 92 is more than 50
disk23[spa1.encl1](HWID: 1673) (from catalog) – Disk grown defects 91 is more than 50
disk18[spa1.encl9](HWID: 2028) (from catalog) – Disk grown defects 91 is more than 50
disk21[spa1.encl12](HWID: 2166) (from catalog) – Disk grown defects 91 is more than 50
disk2[spa1.encl12](HWID: 2147) (from catalog) – Disk grown defects 89 is more than 50
disk8[spa2.encl2](HWID: 1160) (from catalog) – Disk grown defects 88 is more than 50
disk1[spa2.encl7](HWID: 1378) (from catalog) – Disk grown defects 86 is more than 50
disk19[spa1.encl3](HWID: 1759) (from catalog) – Disk grown defects 85 is more than 50
disk22[spa1.encl4](HWID: 1807) (from catalog) – Disk grown defects 85 is more than 50
disk16[spa2.encl3](HWID: 1213) (from catalog) – Disk grown defects 83 is more than 50
disk6[spa1.encl3](HWID: 1746) (from catalog) – Disk grown defects 82 is more than 50
disk11[spa1.encl5](HWID: 1841) (from catalog) – Disk grown defects 82 is more than 50
disk8[spa1.encl7](HWID: 1928) (from catalog) – Disk grown defects 82 is more than 50
disk24[spa2.encl2](HWID: 1176) (from catalog) – Disk grown defects 82 is more than 50
disk24[spa2.encl5](HWID: 1311) (from catalog) – Disk grown defects 81 is more than 50
disk19[spa2.encl6](HWID: 1351) (from catalog) – Disk grown defects 80 is more than 50
disk7[spa2.encl10](HWID: 1519) (from catalog) – Disk grown defects 80 is more than 50
disk8[spa1.encl12](HWID: 2153) (from catalog) – Disk grown defects 79 is more than 50
disk4[spa1.encl10](HWID: 2059) (from catalog) – Disk grown defects 78 is more than 50
disk8[spa2.encl7](HWID: 1385) (from catalog) – Disk grown defects 78 is more than 50
disk12[spa1.encl1](HWID: 1662) (from catalog) – Disk grown defects 77 is more than 50
disk2[spa1.encl7](HWID: 1922) (from catalog) – Disk grown defects 76 is more than 50
disk13[spa2.encl5](HWID: 1300) (from catalog) – Disk grown defects 76 is more than 50
disk10[spa1.encl2](HWID: 1705) (from catalog) – Disk grown defects 75 is more than 50
disk14[spa1.encl1](HWID: 1664) (from catalog) – Disk grown defects 74 is more than 50
disk17[spa1.encl1](HWID: 1667) (from catalog) – Disk grown defects 74 is more than 50
disk15[spa1.encl12](HWID: 2160) (from catalog) – Disk grown defects 73 is more than 50
disk1[spa2.encl8](HWID: 1423) (from catalog) – Disk grown defects 73 is more than 50
disk7[spa1.encl3](HWID: 1747) (from catalog) – Disk grown defects 72 is more than 50
disk13[spa2.encl12](HWID: 1615) (from catalog) – Disk grown defects 72 is more than 50
disk17[spa1.encl12](HWID: 2162) (from catalog) – Disk grown defects 71 is more than 50
disk18[spa1.encl7](HWID: 1938) (from catalog) – Disk grown defects 70 is more than 50
disk24[spa2.encl8](HWID: 1446) (from catalog) – Disk grown defects 70 is more than 50
disk6[spa2.encl10](HWID: 1518) (from catalog) – Disk grown defects 70 is more than 50
disk4[spa1.encl1](HWID: 1654) (from catalog) – Disk grown defects 69 is more than 50
disk9[spa1.encl3](HWID: 1749) (from catalog) – Disk grown defects 68 is more than 50
disk18[spa1.encl2](HWID: 1713) (from catalog) – Disk grown defects 68 is more than 50
disk12[spa2.encl2](HWID: 1164) (from catalog) – Disk grown defects 68 is more than 50
disk20[spa1.encl1](HWID: 1670) (from catalog) – Disk grown defects 67 is more than 50
disk19[spa2.encl12](HWID: 1621) (from catalog) – Disk grown defects 67 is more than 50
disk4[spa2.encl10](HWID: 1516) (from catalog) – Disk grown defects 67 is more than 50
disk14[spa1.encl5](HWID: 1844) (from catalog) – Disk grown defects 66 is more than 50
disk17[spa1.encl7](HWID: 1937) (from catalog) – Disk grown defects 65 is more than 50
disk2[spa2.encl10](HWID: 1514) (from catalog) – Disk grown defects 65 is more than 50
disk2[spa1.encl2](HWID: 1697) (from catalog) – Disk grown defects 64 is more than 50
disk13[spa1.encl10](HWID: 2068) (from catalog) – Disk grown defects 64 is more than 50
disk21[spa1.encl11](HWID: 2121) (from catalog) – Disk grown defects 64 is more than 50
disk20[spa2.encl8](HWID: 1442) (from catalog) – Disk grown defects 64 is more than 50
disk2[spa2.encl12](HWID: 1604) (from catalog) – Disk grown defects 64 is more than 50
disk10[spa2.encl4](HWID: 1252) (from catalog) – Disk grown defects 63 is more than 50
disk15[spa2.encl3](HWID: 1212) (from catalog) – Disk grown defects 63 is more than 50
disk19[spa1.encl12](HWID: 2164) (from catalog) – Disk grown defects 62 is more than 50
disk6[spa2.encl12](HWID: 1608) (from catalog) – Disk grown defects 62 is more than 50
disk17[spa1.encl2](HWID: 1712) (from catalog) – Disk grown defects 60 is more than 50
disk13[spa1.encl5](HWID: 1843) (from catalog) – Disk grown defects 60 is more than 50
disk17[spa1.encl11](HWID: 2117) (from catalog) – Disk grown defects 59 is more than 50
disk18[spa2.encl7](HWID: 1395) (from catalog) – Disk grown defects 59 is more than 50
disk4[spa2.encl11](HWID: 1561) (from catalog) – Disk grown defects 58 is more than 50
disk14[spa1.encl4](HWID: 1799) (from catalog) – Disk grown defects 57 is more than 50
disk20[spa1.encl8](HWID: 1985) (from catalog) – Disk grown defects 57 is more than 50
disk14[spa2.encl3](HWID: 1211) (from catalog) – Disk grown defects 57 is more than 50
disk17[spa2.encl4](HWID: 1259) (from catalog) – Disk grown defects 57 is more than 50
disk22[spa2.encl4](HWID: 1264) (from catalog) – Disk grown defects 56 is more than 50
disk5[spa2.encl4](HWID: 1247) (from catalog) – Disk grown defects 55 is more than 50
disk15[spa2.encl6](HWID: 1347) (from catalog) – Disk grown defects 55 is more than 50
disk10[spa2.encl9](HWID: 1477) (from catalog) – Disk grown defects 55 is more than 50
disk23[spa1.encl2](HWID: 1718) (from catalog) – Disk grown defects 54 is more than 50
disk9[spa1.encl9](HWID: 2019) (from catalog) – Disk grown defects 54 is more than 50
disk20[spa1.encl10](HWID: 2075) (from catalog) – Disk grown defects 53 is more than 50
disk22[spa2.encl6](HWID: 1354) (from catalog) – Disk grown defects 53 is more than 50
disk7[spa2.encl4](HWID: 1249) (from catalog) – Disk grown defects 51 is more than 50

Expert’s Advice :

Disks may suffer from increasing number of defective sectors during
heavy load.

The reported disks are in service and should be failed for extra
caution before time critical workload.

————————————————————————

Rule : DM043
Issue Detected : Disk uncorrectable read errors
Severity : Medium
Components : disk16[spa1.encl11](HWID: 2116) (from catalog) – Disk uncorrectable read errors 44 is more than threshold
disk14[spa2.encl2](HWID: 1166) (from catalog) – Disk uncorrectable read errors 34 is more than threshold
disk15[spa2.encl6](HWID: 1347) (from catalog) – Disk uncorrectable read errors 24 is more than threshold
disk21[spa2.encl5](HWID: 1308) (from catalog) – Disk uncorrectable read errors 20 is more than threshold
disk4[spa1.encl6](HWID: 1879) (from catalog) – Disk uncorrectable read errors 18 is more than threshold
disk9[spa1.encl3](HWID: 1749) (from catalog) – Disk uncorrectable read errors 14 is more than threshold
disk14[spa1.encl8](HWID: 1979) (from catalog) – Disk uncorrectable read errors 7 is more than threshold
disk22[spa2.encl4](HWID: 1264) (from catalog) – Disk uncorrectable read errors 7 is more than threshold
disk12[spa2.encl8](HWID: 1434) (from catalog) – Disk uncorrectable read errors 7 is more than threshold
disk3[spa2.encl7](HWID: 1380) (from catalog) – Disk uncorrectable read errors 6 is more than threshold

Related:

  • No Related Posts

Volume Revert: Bad block encountered during revert operation

Details
Product: Windows Operating System
Event ID: 61
Source: VolSnap
Version: 5.2.3790.1830
Message: Volume Revert: Bad block encountered during revert operation
   
Explanation

During a shadow copy revert operation, the information stored in the shadow copy storage area (also referred to as the Diff Area) is copied back to the original volume. During this process, a bad block was detected. The revert process will continue, but the data contained in the bad blocks may be inaccessible or corrupt.

Cause

A bad block was encountered in the process of copying blocks of data from the Diff Area to the original volume.

   
User Action

When the revert operation has completed, run chkdsk /r on the original volume and on the volume where the Diff Area is stored to fix any file system metadata that may have been corrupted by the bad block.

Related:

LDM – bad block(s) found

Details
Product: Windows Operating System
Event ID: 23
Source: dmio
Version: 5.2.3790.1830
Message: LDM – bad block(s) found
   
Explanation

This event indicates that Logical Disk Manager has detected I/O failures on a disk. This error occurs when read and write operations fail.

Cause

Possible causes include:

  • A hardware failure that prevents communication with a disk (for example, a loose cable, a loose disk controller card, or a cable failure).
  • Uncorrectable bad sectors on a disk.
   
User Action

Do one or more of the following:

  • Check the status of your hardware for any failures (for example, a disk, controller card, or cable failure).
  • Check Event Viewer for additional events from lower-level storage drivers that might indicate the cause of the failure.
  • Use Chkdsk or a similar software tool to check if any disks have bad sectors and need to be replaced.
  • Restart the computer.
  • Contact Microsoft Customer Service and Support.

Related:

The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume %2.

Details
Product: Windows Operating System
Event ID: 55
Source: ntfs
Version: 5.2
Symbolic Name: IO_FILE_SYSTEM_CORRUPT_WITH_NAME
Message: The file system structure on the disk is corrupt and unusable.
Please run the chkdsk utility on the volume %2.
   
Explanation

The file system structure on the volume listed in the message might be corrupt because of one or more of the following reasons:

  • The disk might have bad sectors.
  • I/O requests issued by the file system to the disk subsystem might not have been completed successfully.
   
User Action

Check the state of the file system and repair it if necessary.

To check the state of the file system

  1. Click Start, click Run, and then, in the Open box, typecmd
  2. To determine whether the volume is corrupt, at the command prompt, typechkntfs Drive:
  • If the message “Drive_letter: is dirty” is displayed, the volume is corrupt. In this case, repair the file system.
  • If the message “Drive_letter: is not dirty” is displayed, the volume is not corrupt and no further action is required.

To repair the file system

  1. Save any unsaved data and close any open programs.
  2. Restart the computer.The volume is automatically checked and repaired when you restart the computer.

Alternatively, you can run the Chkdsk tool from the command prompt without shutting down the computer first.

  1. Click Start, click Run, and then typecmd
  2. At the command prompt, type
  3. chkdsk /X Drive:Chkdsk runs and automatically repairs the volume.

If the following message appears, type Y.

“Cannot lock current drive. Chkdsk cannot run because the volume is in use by another process. Would you like to schedule this volume to be checked the next time the system restarts?”

The next time the computer is started, Chkdsk will automatically run.

If the NTFS 55 message appears regularly, for example daily or weekly, run Chkdsk using the /R command-line option. This option allows Chkdsk to locate bad sectors on the hard disk.

Related:

Volume Snapshot Driver – Diff Area health issues

Details
Product: Windows Operating System
Event ID: 36
Source: VolSnap
Version: 5.2.3790.1830
Message: Volume Snapshot Driver – Diff Area health issues
   
Explanation

To maintain the consistency of shadow copies, the Volume Shadow Copy Service saves the original data to a shadow copy storage area (also referred to as a Diff Area). This event indicates that Volume Shadow Copy Service encountered an issue when writing to this area.

Cause

Possible causes include:

  • A disk I/O error occurred while performing the copy-on-write operation.
  • There is a bad block on the disk that contains the volume where the Diff Area is stored.
  • The Diff Area is not large enough to store the shadow copies.
  • The disk I/O or system load is so large that the Diff Area was not able increase in size fast enough.
   
User Action

Do one or more of the following:

  • Using chkdsk /r, check for errors on the volume where the Diff Area is stored and on the original volume,.
  • Check the system log in Event Viewer for errors associated with the volume where the Diff Area is stored or with the original volume.
  • Check for hardware errors on the disks that contain the volume where the Diff Area is stored and on the original volume.
  • Move the Diff Area to a different, dedicated volume.
  • Move the Diff Area to a larger volume.
  • Allocate more storage space for the Diff Area.
  • Increase the Diff Area initial size and growth rate by changing the MinDiffArea registry key.Caution: Incorrectly editing the registry may severely damage your system. Before making changes to the registry, you should back up any valued data on the computer.

Related:

NTFS – File system corrupt

Details
Product: Windows Operating System
Event ID: 55
Source: ntfs
Version: 5.2.3790.1830
Message: NTFS – File system corrupt
   
Explanation

The file system on the volume might be corrupt due to one or more of the following reasons:

  • The disk might have bad sectors.
  • I/O requests issued by the file system to the disk subsystem might not have been completed successfully.

Cause

  • The disk might have bad sectors.
  • I/O requests issued by the file system to the disk subsystem might not have been completed successfully.
   
User Action

Check the state of the file system and repair it if necessary.

To check the state of the file system

  1. At a command prompt, type chkntfs <drive letter>:
  2. Check the message from chkntfs.
  • If chkntfs displays the message “<drive letter>: is dirty”, the volume is corrupt. In this case, repair the file system using the chkdsk /r command.
  • If chkntfs displays the message “<drive letter>: is not dirty”, the volume is not corrupt and no further action is required.

To repair the file system

  1. Save any unsaved data, close any open programs, and restart the computer.
  2. Microsoft® Windows® automatically runs chkdsk /r on “dirty” (corrupt) volumes to check and repair them.

You can also run chkdsk manually using the following steps.

  • At a command prompt, type chkdsk /x <drive letter>:Chkdsk runs and automatically repairs the volume.
  • If chkdsk displays the following message, type Y. “Cannot lock current drive. Chkdsk cannot run because the volume is in use by another process. Would you like to schedule this volume to be checked the next time the system restarts?”Windows will automatically run chkdsk the next time the computer is started.

If you regularly see NTFS Event ID 41 or Event ID 55 in Event Viewer, run chkdsk using the /r option. This option allows chkdsk to locate bad sectors on the hard disk.

Related Resources

For more information about the chkdsk /c and /i options, see Knowledge Base article 187941 “An Explanation of CHKDSK and the New /C and /I Switches” at http://go.microsoft.com/fwlink/?LinkId=25770.

For more information about NTFS recoverability, see Knowledge Base article 101670 “Transaction Log Supports NTFS Recoverability,” at http://go.microsoft.com/fwlink/?LinkId=25981.

Related:

LDM – Internal dmboot Service error

Details
Product: Windows Operating System
Event ID: 5
Source: dmboot
Version: 5.2.3790.1830
Message: LDM – Internal dmboot Service error
   
Explanation

This event indicates that the Logical Disk Manager service detected an internal error during the startup process of the server.

Cause

Possible causes include:

  • A hardware failure that prevents communication with a disk (for example, a loose cable, a loose disk controller card, or a cable failure).
  • Unexpected removal of a disk.
  • Uncorrectable bad sectors on a disk.
  • A software error.
   
User Action

Do one or more of the following:

  • Check the status of your hardware for any failures (for example, a disk, controller card, or cable failure).
  • Check Event Viewer for additional events from lower-level storage drivers that might indicate the cause of the failure.
  • Restart the system.
  • Contact Microsoft Customer Service and Support.

Related: