「 Failed to recover SP 」を指定してシステムをブートできない
環境
- AFF A220 / FAS27x0 / AFF C190
- AFF A200 / FAS26x0
- FAS80x0
- AFF A300 / FAS8200
- AFF A700 / FAS9000
問題
- ストレージシステムは(ONTAPアップグレード中など)リブートされますが、 ブートに失敗し、LOADERに移行します。例:
...
Waiting for SP ...
SP failure. Resetting SP from primary FW. This can take a few minutes
Waiting for SP ...
SP failure. Resetting SP from backup FW. This can take a few minutes
Waiting for SP ...
Failed to recover SP
IPMI PCI Slot Control failed.
IPMI PCI Slot Configuration failed.
Configuring Devices ...
IPMI:Get controller FRU inventory:failed
IPMI:Get midplane FRU 0 inventory:failed
IPMI: Get NVRAM FRU inventory:failed
BIOS POST Failure(s) detected: SP IPMI failure. Abort AUTOBOOT
LOADER-A>
BIOS POST Failure(s) detected: Failed to get FRU data. Abort AUTOBOOT
- サービスプロセッサ(SP)のイベント ログでも同様のエラーメッセージが報告されます。例:
Record 1287: Tue Apr 14 14:34:05.000000 2020 [SysFW.notice]: IPMI:Read midplane FRU common header:timeout - retrying
Record 1288: Tue Apr 14 14:34:10.000000 2020 [SysFW.notice]: IPMI:Read midplane FRU common header:timeout
Record 1289: Tue Apr 14 14:34:13.000000 2020 [SysFW.notice]: Failed to recover SP
Record 1290: Tue Apr 14 14:34:13.000000 2020 [SysFW.critical]: IPMI:Read midplane FRU common header:failed
Record 1291: Sun Jan 01 00:02:58.340000 2017 [Trap Event.critical]: hwassist post_error (26)
Record 1292: Tue Apr 14 14:34:14.000000 2020 [SysFW.critical]: IPMI PCI Slot Control failed.
Record 1293: Sun Jan 01 00:02:59.310000 2017 [Trap Event.critical]: hwassist post_error (26)
Record 1294: Tue Apr 14 14:34:18.000000 2020 [CFE.notice]: Loader time adjust: Set BMC time. Old time: Sun Jan 1 00:03:03 2017. New time: Tue Apr 14 14:34:18 2020.
Record 1295: Tue Apr 14 14:34:18.000000 2020 [Boot Loader.notice]: Received time sync
Record 1296: Tue Apr 14 14:34:20.000000 2020 [Boot Loader.critical]: Abort Autoboot due to BIOS POST failure.
Record 1297: Tue Apr 14 14:34:20.280000 2020 [Trap Event.critical]: hwassist post_error (26)
Record 1298: Tue Apr 14 14:34:24.020000 2020 [IPMI.notice]: 001c | 02 | EVT: 6fc220ff | System_FW_Status | Assertion Event, "Bootloader is running"
- 問題は、コントローラ を再装着したあともe0M / SPにケーブルが接続されていなくても維持されます( 「ノードのシャットダウン」で説明されているように、過剰なSPトラフィックの問題を排除するため、「SP IPMIの障害」が原因でブートできません)。