A250 BMCハートビートが失われ、BMC 15.10でノードが応答しない
環境
- AFF A250
- AFF C250
- ASA A250
- ASA C250
- FAS500f
- BMC 15.10
問題
- EMSエラーが発生した予期しないノードのリブート:
[?] Fri Nov 17 13:10:28 +0100 [NodeName: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
[?] Fri Nov 17 13:21:27 +0100 [NodeName: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
[?] Fri Nov 17 13:21:27 +0100 [NodeName: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
[?] Fri Nov 17 13:31:46 +0100 [NodeName: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
[?] Fri Nov 17 13:34:07 +0100 [NodeName: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
[?] Fri Nov 17 13:44:07 +0100 [NodeName: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)
[?] Fri Nov 17 13:44:08 +0100 [NodeName: mgwd: mgwd.notify.halt.result:info]: MGWD able to notify CLAM on its HA partner node that this node is undergoing a planned shutdown (reason: E). Error: -
- BMCイベントログから、Pilot FPGA ACサイクルが発生し、その後に多数のBus修正可能なエラーが発生します。
1c5 | OEM record f2 | FPGA pull BMC whole reset
1c6 | OEM record f2 | Pilot FPGA AC cycle
1c7 | 11/17/2023 | 12:50:16 | Power Supply #0x20 | Presence detected | Asserted
1c8 | 11/17/2023 | 12:50:16 | Power Supply #0x25 | Presence detected | Asserted
1c9 | 11/17/2023 | 12:50:17 | Power Supply #0x72 | Presence detected | Asserted
1ca | 11/17/2023 | 12:50:17 | Power Supply #0x73 | Presence detected | Asserted
1cb | 11/17/2023 | 12:50:19 | Battery #0x4a | State Deasserted
1cc | 11/17/2023 | 12:50:19 | Battery #0x4b | State Asserted
1cd | 11/17/2023 | 12:50:19 | Battery #0x4c | State Asserted
1ce | 11/17/2023 | 12:50:19 | Battery #0x4d | State Deasserted
1cf | 11/17/2023 | 12:50:19 | Battery #0x4f | State Deasserted
1d0 | 11/17/2023 | 12:50:19 | Other FRU #0x50 |
1d1 | 11/17/2023 | 12:50:19 | Other FRU #0x50 |
1d2 | 11/17/2023 | 12:50:19 | Other FRU #0x50 |
1d3 | 11/17/2023 | 12:50:19 | Other FRU #0x50 |
1d4 | 11/17/2023 | 12:50:36 | Critical Interrupt #0x31 | Bus Correctable error | Asserted
1d5 | 11/17/2023 | 12:50:36 | Critical Interrupt #0x31 | Bus Correctable error | Asserted
1d6 | 11/17/2023 | 12:50:36 | Critical Interrupt #0x31 | Bus Correctable error | Asserted
1d7 | 11/17/2023 | 12:50:36 | Critical Interrupt #0x31 | Bus Correctable error | Asserted