システムをブートできない(Failed to recover SP)というメッセージが表示される)
環境
- AFF A220 / FAS27x0 / AFF C190
- AFF A200 / FAS26x0
- FAS80x0
- FAS9500
- AFF A300 / FAS8200
- AFF A700 / FAS9000
- AFF A50 / AFF A30 / AFF A20
- AFF C60 / AFF C30
問題
- ストレージシステムは(ONTAPアップグレード中など)リブートされますが、 ブートに失敗し、Loaderが停止します。
例:
...
Waiting for SP ...
SP failure. Resetting SP from primary FW. This can take a few minutes
Waiting for SP ...
SP failure. Resetting SP from backup FW. This can take a few minutes
Waiting for SP ...
Failed to recover SP
IPMI PCI Slot Control failed.
IPMI PCI Slot Configuration failed.
Configuring Devices ...
IPMI:Get controller FRU inventory:failed
IPMI:Get midplane FRU 0 inventory:failed
IPMI: Get NVRAM FRU inventory:failed
BIOS POST Failure(s) detected: SP IPMI failure. Abort AUTOBOOT
LOADER-A>
BIOS POST Failure(s) detected: Failed to get FRU data. Abort AUTOBOOT
- サービスプロセッサ(SP)のイベント ログでも同様のエラーメッセージが報告されます。
例:
Record 1287: Tue Apr 14 14:34:05.000000 2020 [SysFW.notice]: IPMI:Read midplane FRU common header:timeout - retrying
Record 1288: Tue Apr 14 14:34:10.000000 2020 [SysFW.notice]: IPMI:Read midplane FRU common header:timeout
Record 1289: Tue Apr 14 14:34:13.000000 2020 [SysFW.notice]: Failed to recover SP
Record 1290: Tue Apr 14 14:34:13.000000 2020 [SysFW.critical]: IPMI:Read midplane FRU common header:failed
Record 1291: Sun Jan 01 00:02:58.340000 2017 [Trap Event.critical]: hwassist post_error (26)
Record 1292: Tue Apr 14 14:34:14.000000 2020 [SysFW.critical]: IPMI PCI Slot Control failed.
Record 1293: Sun Jan 01 00:02:59.310000 2017 [Trap Event.critical]: hwassist post_error (26)
Record 1294: Tue Apr 14 14:34:18.000000 2020 [CFE.notice]: Loader time adjust: Set BMC time. Old time: Sun Jan 1 00:03:03 2017. New time: Tue Apr 14 14:34:18 2020.
Record 1295: Tue Apr 14 14:34:18.000000 2020 [Boot Loader.notice]: Received time sync
Record 1296: Tue Apr 14 14:34:20.000000 2020 [Boot Loader.critical]: Abort Autoboot due to BIOS POST failure.
Record 1297: Tue Apr 14 14:34:20.280000 2020 [Trap Event.critical]: hwassist post_error (26)
Record 1298: Tue Apr 14 14:34:24.020000 2020 [IPMI.notice]: 001c | 02 | EVT: 6fc220ff | System_FW_Status | Assertion Event, "Bootloader is running"
- コントローラを 再装着したあとも、e0M / SPにケーブルが接続されていなくても問題が維持される
- (「ノードのシャットダウン」で説明されているように、過剰なSPトラフィックの問題を除外し、「SP IPMI障害」が原因でブートできないようにするため)