ヘルスモニタによってShelfPSUFailure_Alertが報告される
環境
- FAS / AFFシステム
- ディスク シェルフ
- 電源装置(PSU)
- ヘルスモニタプロセスschm:ShelfPSUFailure_Alert
問題
- 電力問題のための散発的なコールホームが見られる
- イベントログでは次のアラートが報告されます。
[Node-02: schmd: hm.alert.raised:alert]: Alert Id = ShelfPSUFailure_Alert , Alerting Resource = 16350XXXXXXXX448 raised by monitor system-connect
[Node-02: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0b. Check fans, power supplies, disks, and temperature sensors.
- 出力
storage show fault
は次のように表示されます
Enclosure Status: unrecoverable
Channel: 0b
Shelf: 11
Shelf Type: DS224-12
Product Serial Number: SHFFGXXXXXXXXX
Module Type: IOM12
Power Supplies:
Element Status Status Bytes Status Descriptions
1: CRITICAL 02,00,00,F3 DC FAIL, AC FAIL, OFF, RQSTED ON, FAIL
2: OK 01,00,00,20 RQSTED ON
- シェルフログにPSUを搭載した問題:
Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B004F; M0; ENC_MGT; power_manager; 02; HAL indicates PSU FAILURE fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B0050; M0; ENC_MGT; power_manager; 02; HAL indicates PSU TURNED OFF fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B0053; M0; ENC_MGT; power_manager; 02; HAL indicates PSU AC FAILURE fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 PCM FAILURE Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0070; M0; ENC_MGT; power_manager; 02; Re-asserting FAIL NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 TURNED OFF Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0072; M0; ENC_MGT; power_manager; 02; Setting AC MISSING NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 AC FAILURE Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0070; M0; ENC_MGT; power_manager; 02; Re-asserting AC MISSING NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B005D; M0; ENC_MGT; power_manager; 04; PCM 1 faults indicate loss of local fan power Tue Mar 5 02:05:39 2024 ( 730+01:52:31.233); 030B0060; M0; ENC_MGT; power_manager; 04; PCM 1 local fan power restored Tue Mar 5 02:05:39 2024 ( 730+01:52:31.233); 030B0084; M0; ENC_MGT; power_manager; 02; Clearing PSU AC Missing (non-redundant) alarm
- PSUを取り付け直しても問題が持続する