AFF A400 FAS8300またはFAS8700でSPハートビートが停止したためにシステムがシャットダウンする(BMC 13.5以前)
環境
- AFF A400、FAS8300、FAS8700
- BMC 13.5以前
問題
- SP HBT が停止すると、ノードがリブートします
HA Group Notification (SP HBT STOPPED) ALERT
HA Group Notification (SP HBT MISSED) NOTICE
- EMS ログに IPMI タイムアウトが記録されます
07:35:26 -0500 [node-01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
07:47:04 -0500 [node-01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
07:47:04 -0500 [node-01: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
07:58:42 -0500 [node-01: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
- SKTRACE ログに KCS エラーが表示されます
2021-02-02T12:24:13Z 51150633355156368 [17:0] IPMI_KCS_ERR: kcs_start_write: cmd 0x43 nf 0xa state 3 not write
2021-02-02T12:24:13Z 51150633355158704 [17:0] IPMI_KCS_ERR: KCS cmd 0x43 nf 0xa: Failed to start write
2021-02-02T12:24:14Z 51150635326015290 [5:0] IPMI_KCS_ERR: KCS cmd 0x43 nf 0xa retry 1
2021-02-02T12:24:16Z 51150641624866880 [15:0] IPMI_KCS_ERR: kcs_start_write: cmd 0x43 nf 0xa state 3 not write
2021-02-02T12:24:16Z 51150641624868402 [15:0] IPMI_KCS_ERR: KCS cmd 0x43 nf 0xa: Failed to start write
2021-02-02T12:24:17Z 51150642641019186 [13:0] IPMI_KCS_ERR: kcs_error: cmd 0x43 nf 0xa IBF not 0
2021-02-02T12:24:17Z 51150643643988838 [2:0] IPMI_KCS_ERR: kcs_error abort: cmd 0x43 nf 0xa IBF not 0
2021-02-02T12:24:18Z 51150644631633176 [15:0] IPMI_KCS_ERR: kcs_error cmd 0x43 nf 0xa not idle
2021-02-02T12:24:18Z 51150645623620448 [9:0] IPMI_KCS_ERR: kcs_error: cmd 0x43 nf 0xa IBF not 0
2021-02-02T12:24:19Z 51150646611276748 [8:0] IPMI_KCS_ERR: kcs_error abort: cmd 0x43 nf 0xa IBF not 0
2021-02-02T12:24:19Z 51150647612147226 [9:0] IPMI_KCS_ERR: kcs_error cmd 0x43 nf 0xa not idle
2021-02-02T12:24:19Z 51150647612158278 [9:0] IPMI_KCS_ERR: kcs_error cmd 0x43 nf 0xa retry exhausted
2021-02-02T12:24:20Z 51150648709514498 [17:0] IPMI_KCS_ERR: KCS cmd 0x43 nf 0xa retry 2