RecoverPoint: ScaleIO Initialization not moving, host crash or stuck on highload

Article Number: 503004 Article Version: 3 Article Type: Break Fix



RecoverPoint,RecoverPoint CL,RecoverPoint EX,ScaleIO Product Family,ScaleIO Software

IOs to offsets greater than 1T are causing a short init on the CG. Host crash/stuck on high load.

Symptoms found in the logs:

ScaleIO host logs:

sdw1 kernel: attempt to access beyond end of device

sdw1 kernel: sdr: rw=33, want=2147483736, limit=1967000000


OR

sdw4 kernel: NMI watchdog: BUG: soft lockup – CPU#40 stuck for 22s! [splDataPathExec:97896]

localhost kernel: INFO: task dd:3540 blocked for more than 120 seconds.

Splitter logs:

sdw1 kernel: 4967/4967: RPS:#0 – spl_kbox_end_io : offset = 2147483704, len = 16384, MajorMinor(65, 16), error status = -5

sdw1 kernel: 1351/1351: RPS:#1 – CommandIoSplit_KboxEndIo: Immediate MOH is true. Moving to Tracking. vol guid=0xe102395df73e4b67

RPA (Storage):

st_handle_write_atio: huge write !!! len=1048576 max_chunk_len=524288 (in bytes), ox_id=0x16, cd_remote_entity_id=0x6bca7c6343a4ccca, vlun=0x2c2d8

Splitter type(s): ScaleIO Splitter

Affected versions: 5.0.1, 5.0.1.1, 5.0.1.2

Splitter has no limitation on number of inflight IOs. Splitter’s RPA IO timeout flow is wrongly handled. Historically RPA supports IOs to addresses up to 1T.

Workaround:

None

Resolution:

Dell EMC engineering is currently investigating this issue. A permanent fix is still in progress. Contact the Dell EMC Customer Support Center or your service representative for assistance and reference this solution ID.

Related:

Leave a Reply