SP HBT がノードのダウンをトリガーしていません
環境
- FAS / AFFシステム
- ONTAP 9
- サービス プロセッサ(SP)
- Baseboard Management Controller(BMC)
問題
SPファームウェアのアップグレードに失敗 するとノードのシャットダウンがトリガーされ、イベントログに次のメッセージが記録されます。
Mon Mar 08 15:19:06 CET [node_name: spsm_listener: sp.update.status:debug]: params: {'reason': 'sp_startup_notify_servprocd: SP startup handler has been called.\n'}
Mon Mar 08 15:34:33 CET [node_name: servprocd: sp.servprocd.upd.evts:debug]: params: {'reason': 'SP Firmware network update has been successfully scheduled from 3.9 to 3.10'}
Mon Mar 08 15:35:52 CET [node_name: servprocd: sp.servprocd.upd.error:error]: SP update error: SP firmware update failure has been detected.
Mon Mar 08 16:18:07 CET [node_name: servprocd: sp.servprocd.upd.error:error]: SP update error: SP firmware update failure has been detected.
Mon Mar 08 16:20:33 CET [node_name: spsm_listener: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
Mon Mar 08 16:30:02 CET [node-name: spsm_listener: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 20 seconds.
Mon Mar 08 16:30:57 CET [node-name: sp_config_0: sp.update.status:debug]: params: {'reason': 'sp_bootup_notify_servprocd: SP online handler has been called '}
Mon Mar 08 16:31:07 CET [node-name: sp_cluster_user_mgmt_wq_wq: sp.userlist.update.failed:error]: Error updating SP user information, Communication error (action 7).
Mon Mar 08 16:32:57 CET [node-name: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 2 minutes.
Mon Mar 08 16:34:57 CET [node-name: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the SP)
Mon Mar 08 16:35:12 CET [node-name: sp_config_0: spmgmt.driver.timeout:error]: The software driver for the Service Processor (SP) detected a problem: Unable to update SP network information at this time.