マルチディスクエラーが原因でHAノードがシャットダウンされる
環境
- ONTAP 9
- DS460C
問題
- 複数のディスクがパニック状態になると、ノードがリブートします。EMSには次のエラーメッセージが表示されます。
12/7/2023 17:03:00 node-03 ERROR cf.multidisk.fatalProblem: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr aggr1_n3: raid volfsm, fatal multi-disk error..
Raid type - raid_dp
Group name plex0/rg0 state NORMAL. 5 disks failed in the group.
Disk 0c.40.6 Shelf 40 Drawer 1 Slot 6 Bay 6 [NETAPP X375_WVELE04TA07 NA01] S/N [V1K65EJG] UID [5000CCA0:BCB45494:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist.
Disk 0c.40.7 Shelf 40 Drawer 1 Slot 7 Bay 7 [NETAPP X375_WVELE04TA07 NA01] S/N [V1K6AZMG] UID [5000CCA0:BCB4A7F0:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: no valid path to disk.
Disk 0c.40.8 Shelf 40 Drawer 1 Slot 8 Bay 8 [NETAPP X375_WVELE04TA07 NA01] S/N [V1K6B4PG] UID [5000CCA0:BCB4AA64:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist.
Disk /aggr1_n3/plex0/rg0/0c.40.9 Shelf 40 Drawer 1 Slot 9 Bay 9 [NETAPP X375_WVELE04TA07 NA01] S/N [V1K4J.
Because more than two disks are faulty, the aggregate status is failed
- アグリゲートのステータスがfailedと表示されます。
Aggregate aggr2_n3 (failed, raid_dp, partial, fast zeroed) (block checksums)
Plex /aggr2_n3/plex0 (offline, failed, inactive)
RAID group /aggr2_n3/plex0/rg0 (partial, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.30.27 0a 30 27 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
parity 0b.41.34 0b 41 34 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0d.31.34 0d 31 34 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data FAILED N/A 3807816/ -
data 0a.30.28 0a 30 28 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.35 0b 41 35 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0d.31.35 0d 31 35 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data FAILED N/A 3807816/ -
data 0a.30.29 0a 30 29 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.42 0b 41 42 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0d.31.42 0d 31 42 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data FAILED N/A 3807816/ -
data 0a.30.36 0a 30 36 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.43 0b 41 43 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0d.31.43 0d 31 43 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0c.40.42 0c 40 42 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0a.30.37 0a 30 37 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
Raid group is missing 3 disks.
Aggregate aggr1_n4 (failed, raid_dp, partial) (block checksums)
Plex /aggr1_n4/plex0 (offline, failed, inactive)
RAID group /aggr1_n4/plex0/rg0 (partial, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity FAILED N/A 3807816/ -
parity 0d.31.25 0d 31 25 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.25 0b 41 25 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0a.30.33 0a 30 33 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data FAILED N/A 3807816/ -
data 0d.31.26 0d 31 26 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.26 0b 41 26 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0a.30.34 0a 30 34 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data FAILED N/A 3807816/ -
data 0d.31.27 0d 31 27 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.27 0b 41 27 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0a.30.35 0a 30 35 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0c.40.36 0c 40 36 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0d.31.28 0d 31 28 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.28 0b 41 28 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0a.30.42 0a 30 42 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0c.40.37 0c 40 37 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
Raid group is missing 3 disks.
Aggregate aggr2_n4 (failed, raid_dp, partial) (block checksums)
Plex /aggr2_n4/plex0 (offline, failed, inactive)
RAID group /aggr2_n4/plex0/rg0 (normal, block checksums)
RAID group /aggr2_n4/plex0/rg2 (partial, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0b.41.14 0b 41 14 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
parity 0a.30.22 0a 30 22 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0c.40.17 0c 40 17 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0d.31.15 0d 31 15 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.15 0b 41 15 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0a.30.23 0a 30 23 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data FAILED N/A 3807816/ -
data 0d.31.16 0d 31 16 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.16 0b 41 16 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0a.30.30 0a 30 30 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data FAILED N/A 3807816/ -
data 0d.31.17 0d 31 17 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.17 0b 41 17 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0a.30.31 0a 30 31 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data FAILED N/A 3807816/ -
data 0d.31.24 0d 31 24 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0b.41.24 0b 41 24 SA:A 0 FSAS 7200 3807816/7798408704 3815447/7814037168
data 0c.40.53 0c 40 53 SA:B 0 FSAS 7200 3807816/7798408704 3815447/7814037168
Raid group is missing 3 disks.
- これらのディスクはすべて同じドロワーにあります。