CFBRIDGE-414:両方のNSMでウォッチドッグリセットが検出されたため、シェルフXへのアクセスが失われました
問題
- scsi.cmd.checkConditionが表示されます。
[Node: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5a.11.3.5L0: Check Condition: CDB 0x28:2183d84b:0001: Sense Data SCSI:aborted command - (0xb - 0x90 0x2 0xfc)(2520).
[Node: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5b.11.0.12L0: Check Condition: CDB 0x28:9ae4029e:0001: Sense Data SCSI:aborted command - (0xb - 0x90 0x2 0xfc)(2669).
[Node: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device e5a.11.1.3L0: request successful after retry #1/#0: cdb 0x28:de3dfca4:0001 (3538).
[Node: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device e5b.11.0.5L0: request successful after retry #1/#0: cdb 0x28:2183d84e:0001 (3371).
- ノードがリブートされました理由:
[Node: config_thread: raid.config.filesystem.disk.missing:info]: File system Disk /Node_SSD/plex0/rg2/e5b.11.2.1 Shelf 11 Bay 1 [NETAPP X4020S173A15TNQF NA55] S/N [XXXXXXXXXXXXXX] UID [36313230:57B16662:00253845:00000002:00000000:00000000:00000000:00000000:00000000:00000000] is missing.
[Node: config_thread: raid.config.filesystem.disk.missing:info]: File system Disk /Node_SSD/plex0/rg2/e5a.11.1.2 Shelf 11 Bay 2 [NETAPP X4020S173A15TNQF NA55] S/N [XXXXXXXXXXXXXX] UID [36313230:57B16664:00253845:00000002:00000000:00000000:00000000:00000000:00000000:00000000] is missing.
[Node: config_thread: cf.multidisk.fatalProblem:error]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr Node_SSD: raid volfsm, fatal multi-disk error.. Raid type - raid_dp Group name plex0/rg2 state NORMAL. 8 disks failed in the group. Disk e5b.11.2.0 Shelf 11 Bay 0 [NETAPP X4020S173A15TNQF NA55] S/N [XXXXXXXXXXXXX] UID [36313230:57B16661:00253845:00000002:00000000:00000000:00000000:00000000:00000000:00000000] error: no valid path to disk. Disk e5b.11.2.1 Shelf 11 Bay 1 [NETAPP X4020S173A15TNQF NA55] S/N [XXXXXXXXXXXXX] UID [36313230:57B16662:00253845:00000002:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist. Disk e5a.11.1.2 Shelf 11 Bay 2 [NETAPP X4020S173A15TNQF NA55] S/N
- シェルフログの両方のモジュールに「software watchdog detected fault」と表示される
--------------------------------------------------------------
Shelflog start time: Sun Mar 9 09:15:18 GMT 2025
Controller Id: XXXXXXXXXXX
Channel: 0x Shelf: 11 Module type: NSM100 Firmware rev: 0305
Shelf product id: NS224NSM100
Shelf Serial Number: XXXXXXXXXXXXX
Module A Serial Number: XXXXXXXXXXX
Log ID: XXXXXXXXXXXXXX
Timestamp: Thu Mar 20 21:54:52 GMT 2025
--------------------------------------------------------------
EVENT LOGS
Timestamp Thu Mar 20 21:54:51 2025
(183+12:51:48.557)
Thu Mar 20 21:54:47 2025 ( 183+12:51:45.089); 02000228; M0; HAL; hal; 02; Failure: software watchdog detected fault.
Thu Mar 20 21:54:47 2025 ( 183+12:51:45.089); 02000229; M0; HAL; hal; 02; Failure info: Client "bridgeWdgClient" triggered wdg. tNow:3b364bc9h, tLast:3b35fa79h, interval:4e20h, failed:0h.
Thu Mar 20 21:54:47 2025 ( 183+12:51:45.089); 02000263; M0; HAL; hal; 04; HAL_ProductCrashAndCoreIt: prior system(pkill -6 bio) status:0 pid:0
Thu Mar 20 21:54:47 2025 ( 183+12:51:45.089); 02000263; M0; HAL; hal; 04; HAL_ProductCrashAndCoreIt: post system(pkill -6 bio) status:0 pid:3102921