SG1100パニック:CPUがMCEブロードキャストに応答しない
環境
- NetApp StorageGRID 管理ノード
- SG1100
問題
- 管理ノード(SG1100)で予期しない再起動と一時的な到達不能が発生しました。

- BMC ログ CPU カタログエラー(
CATERR)
331 Mar/13/2026 01:37:41 [Information] [Host Res Warning] [OEM] Host Partition Reset triggered 255 minutes - Asserted 330 Mar/13/2026 01:36:37 [Critical] [CATERR] [Processor] IERR - Asserted 329 Mar/13/2026 01:35:10 [Critical] [CATERR] [Processor] Machine Check Exception (MCERR) - Asserted
storagegrid_crash_dmesg.logkernel がCPUs not responding to MCE broadcastが原因でパニックを引き起こしたことを示します
[5048608.845286] watchdog: BUG: soft lockup - CPU#75 stuck for 78s! [prometheus-node:46612]
... [5048616.006133] mce: CPUs not responding to MCE broadcast (may include false positives): 10,58 [5048616.006138] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler