ONTAP-170136:FAS8200およびAFF A300システムでCPUが応答しないと、複数のwatchdogコントローラが停止することがある
問題
- 多くのセンサー情報が正しく読み取れません。
> system sensors
Sensor Name | Current | Unit | Status | LCR | LNC | UNC | UCR
-----------------+------------+------------+------------+-----------+-----------+-----------+-----------
CPU0_Temp_Margin | na | degrees C | na | na | na | -11.000 | -1.000
In_Flow_Temp | 20.000 | degrees C | ok | 0.000 | 5.000 | 50.000 | 55.000
Out_Flow_Temp | 27.000 | degrees C | ok | 0.000 | 5.000 | 65.000 | 75.000
PCI_Slot_Temp | 25.000 | degrees C | ok | 0.000 | 5.000 | 60.000 | 70.000
Smart_Bat_Temp | 22.000 | degrees C | ok | 0.000 | 5.000 | 60.000 | 70.000
CPU0_Error | 0x0 | discrete | Asserted | na | na | na | na
CPU0_Therm_Trip | 0x0 | discrete | Asserted | na | na | na | na
Wrench_Port_Up | 0x0 | discrete | Enabled | na | na | na | na
Attn_Sensor1 | 0x0 | discrete | Asserted | na | na | na | na
- FAS8200およびAFF A300ストレージシステムでCPUが応答しなくなると、watchdogコントローラが停止する可能性があります。
watchdog nmi on cpu 0, hang cpu is 0 in process idle: cpu0
Record 1108: Sat Apr 30 05:01:38 2022 [IPMI Event.critical]: NMI
Record 1109: Sat Apr 30 05:01:38 2022 [IPMI.notice]: e800 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 1110: Sat Apr 30 05:01:39 2022 [IPMI Event.critical]: L2 watchdog timeout hard reset
Record 1111: Sat Apr 30 05:01:39 2022 [Trap Event.critical]: hwassist l2_watchdog_reset (29)
Record 1112: Sat Apr 30 05:01:45 2022 [IPMI.notice]: e900 | 02 | EVT: 6fc104ff | System_Watchdog | Assertion Event, "Hard reset"