マルチディスクエラーのパニックが原因でテイクオーバーがキャンセルされました
環境
問題
- SASパスに複数の障害ポイントがあります
- シェルフID 22のシェルフモジュールAとBはどちらも不安定です
ノード1
Tue Jul 12 03:46:38 GMT [node1:ses.status.ModuleError:CRITICAL]: DS2246 (S/N SHxxxxxxxxxx185) shelf 22 on channel 0d SAS expander error for SAS shelf electronics 1: status not available; status not available. This module is on the rear of the shelf at the top left, on shelf module A.
Tue Jul 12 03:46:52 GMT [node1:cf.fsm.takeover.mdp:ALERT]: Failover monitor: takeover attempted after multi-disk failure on partner
Tue Jul 12 03:46:52 GMT [node1:cf.fsm.stateTransit:info]: Failover monitor: UP --> TAKEOVER
Tue Jul 12 03:46:52 GMT [node1:cf.fm.takeoverStarted:notice]: Failover monitor: takeover started
Tue Jul 12 03:46:53 GMT [node2:raid.fm.takeoverFail:error]: RAID takeover failed: Can't find partner root volume.
Tue Jul 12 03:46:53 GMT [node1:cf.rsrc.takeoverFail:ALERT]: Failover monitor: takeover during raid failed; takeover cancelled
ノード2
Tue Jul 12 03:00:34 GMT [acp-vfiler@node2:acp.exp.reset:info]: SAS expander reset issued to 0d.24.B(192.168.3.235), (disk shelf serial number: SHxxxxxxxxxx142).
Tue Jul 12 03:00:35 GMT [acp-vfiler@node2:acp.exp.reset.success:info]: SAS expander reset command sent to 0d.24.B (192.168.3.235) was successful, (disk shelf serial number: SHxxxxxxxxxx142).
Tue Jul 12 03:02:13 GMT [node2:ses.status.electronicsWarn:warning]: DS2246 (S/N SHxxxxxxxxx185) shelf 22 on channel 0d environmental monitoring warning for SES electronics 2: communication error. ; enclosure services hardware failed This module is on the rear of the shelf at the top right.
Tue Jul 12 03:02:13 GMT [node2:ses.status.connectorWarn:warning]: DS2246 (S/N SHxxxxxxxxxx185) shelf 22 on channel 0d SAS connector warning for SAS Connector 3: cannot communicate with connector. This module is on the rear of the shelf at the top right, on shelf module B.
Tue Jul 12 03:02:13 GMT [node2:ses.status.connectorWarn:warning]: DS2246 (S/N SHxxxxxxxxxx185) shelf 22 on channel 0d SAS connector warning for SAS Connector 4: cannot communicate with connector. This module is on the rear of the shelf at the top right, on shelf module B.
Tue Jul 12 03:04:04 GMT [node2:ses.status.connectorInfo:info]: DS2246 (S/N SHxxxxxxxxxx185) shelf 22 on channel 0d SAS connector information for SAS Connector 3: normal status.
Tue Jul 12 03:04:04 GMT [node2:ses.status.connectorInfo:info]: DS2246 (S/N SHxxxxxxxxxx185) shelf 22 on channel 0d SAS connector information for SAS Connector 4: normal status.
Tue Jul 12 03:45:40 GMT [acp-vfiler@node2:acp.exp.reset.success:info]: SAS expander reset command sent to 0d.22.A(192.168.3.185) was successful, (disk shelf serial number: SHxxxxxxxxxx185).
Tue Jul 12 03:46:52 GMT [node2:raid.config.filesystem.disk.missing:info]: File system Disk /aggr0/plex0/rg3/0d.22.21 Shelf 22 Bay 21 [NETAPP X423_TA14E900A10 NA01] S/N [X6U0A015xxxx] is missing.
Tue Jul 12 03:46:52 GMT [node2:raid.config.filesystem.disk.missing:info]: File system Disk /aggr0/plex0/rg1/0d.22.14 Shelf 22 Bay 14 [NETAPP X423_TA14E900A10 NA01] S/N [X6J0A007xxxx] is missing.
Tue Jul 12 03:46:52 GMT [node2:raid.config.filesystem.disk.missing:info]: File system Disk /aggr0/plex0/rg1/0d.22.15 Shelf 22 Bay 15 [NETAPP X423_TA14E900A10 NA01] S/N [X6U0A01Nxxxx] is missing.
Tue Jul 12 03:46:52 GMT [node2:raid.config.filesystem.disk.missing:info]: File system Disk /aggr0/plex0/rg1/0d.22.16 Shelf 22 Bay 16 [NETAPP X423_TA14E900A10 NA01] S/N [X6U0A01Dxxxx] is missing.
Tue Jul 12 03:46:52 GMT [node2:cf.multidisk.fatalProblem:info]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr aggr0: raid volfsm, fatal multi-disk error. raid type raid_dp.