一方のノードで複数のファンの障害が報告される
環境
- FAS2650
- FAS2750
- FAS2720
- ONTAP 9
- サービス プロセッサ(SP)
- Baseboard Management Controller(BMC)
問題
- HAペアの一方のノードで複数のファン障害がイベントログに報告されます。
[Node-02: dsa_worker2: ses.status.temperatureWarning:alert]: DS224-12 (S/N SHFGDXXXX000045) shelf 0 on channel 0b temperature warning for Temperature sensor 12: not installed or failed. Current temperature: 22 C (71 F). This module is on the rear of the shelf at the top left, on shelf module A.
[Node-02: env_mgr: monitor.fan.ok:notice]: All fans are OK.
[Node-02: dsa_worker1: ses.status.temperatureInfo:info]: DS224-12 (S/N SHFGDXXXX000045) shelf 0 on channel 0b temperature information for Temperature sensor 12: normal status.
[Node-02: env_mgr: monitor.fan.ok:notice]: All fans are OK.
[Node-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Chassis temperature is too high..
[Node-02: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Module B Expander Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Module A Expander Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 4 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 3 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 2 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 1 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Ambient Temp) is not readable.
[Node-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Chassis temperature is too high..
[Node-02: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- パートナーノードでは、このようなアラートはトリガーされません。
- すべての電源装置 が緑で点滅し、ノードの前面に黄色のLEDが点灯します。
- エラーを報告するノードのPSUセンサーとファンセンサーは次のとおりです。
Sensor Name State Current Critical Warning Warning Critical
Reading Low Low High High
-------------------------------------------------------------------------------------------------
SNMP Bad Fan Count MULTI_FAILED
Chassis is Under Temp invalid --
Chassis is Over Temp YES
PSU1 INFO FAILED
PSU1 INFO FRU_AVAIL
PSU1 FRU MULTIFAULT
PSU2 FRU MULTIFAULT
Module B Expander Temp failed -- C 0 C 5 C 80 C 90 C
Module A Expander Temp failed -- C 0 C 5 C 80 C 90 C
Midplane 4 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 3 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 2 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 1 Temp failed -- C 0 C 5 C 47 C 52 C
Ambient Temp failed -- C 0 C 5 C 47 C 52 C
Internal Shelf not_available --
CPU0 Temp Margin init_failed -- C -- -- 0 C -1 C
PSU1 Present PRESENT
PSU1 5V not_available -- mV -- -- -- --
PSU1 12V not_available -- mV -- -- -- --
PSU1 5V Curr not_available -- mA -- -- -- --
PSU1 12V Curr not_available -- mA -- -- -- --
PSU1 Fan 1 not_available -- RPM -- -- -- --
PSU1 Fan 2 not_available -- RPM -- -- -- --
PSU1 Inlet Temp not_available -- C 0 C 5 C 57 C 62 C
PSU1 Hotspot Temp not_available -- C 0 C 5 C 90 C 100 C
PSU2 Present PRESENT
PSU2 5V not_available -- mV -- -- -- --
PSU2 12V not_available -- mV -- -- -- --
PSU2 5V Curr not_available -- mA -- -- -- --
PSU2 12V Curr not_available -- mA -- -- -- --
PSU2 Fan 1 not_available -- RPM -- -- -- --
PSU2 Fan 2 not_available -- RPM -- -- -- --
PSU2 Inlet Temp not_available -- C 0 C 5 C 57 C 62 C
PSU2 Hotspot Temp not_available -- C 0 C 5 C 90 C 100 C
PSU_FAN not_available --
Module B Expander Temp failed -- C 0 C 5 C 80 C 90 C
Module A Expander Temp failed -- C 0 C 5 C 80 C 90 C
Midplane 4 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 3 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 2 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 1 Temp failed -- C 0 C 5 C 47 C 52 C
Ambient Temp failed -- C 0 C 5 C 47 C 50 C
Internal Shelf not_available