ノードが温度超過シャットダウン緊急エラーでリブートしました
環境
- ONTAP 9
- AFF A400
- FAS8700およびFAS8300
- BMC FW 13.4以前
問題
- ノードはシャットダウン後にリブートします。
- 複数のセンサー値は
na
次のとおりです。
PVCCIN_CPU0 | na | Volts | na | na | 0.010 | 0.020 | 2.460 | 2.470 | na
PVCCIN_CPU1 | na | Volts | na | na | 0.010 | 0.020 | 2.460 | 2.470 | na
PVDDQ_ABC | na | Volts | na | na | 0.014 | 0.021 | 1.711 | 1.732 | na
PVDDQ_DEF | na | Volts | na | na | 0.014 | 0.021 | 1.711 | 1.732 | na
PVDDQ_GHI | na | Volts | na | na | 0.014 | 0.021 | 1.711 | 1.732 | na
PVDDQ_KLM | na | Volts | na | na | 0.014 | 0.021 | 1.711 | 1.732 | na
P1V05_PCH | na | Volts | na | na | 0.940 | 0.992 | 1.100 | 1.147 | na
...
CX5_Temp1 | na | degrees C | na | na | 0.000 | 5.000 | 80.000 | 85.000 | na
CX5_Temp2 | na | degrees C | na | na | 0.000 | 5.000 | 80.000 | 85.000 | na
...
RiserL_Temp1 | na | degrees C | na | na | 0.000 | 5.000 | 60.000 | 70.000 | na
RiserL_Temp2 | na | degrees C | na | na | 0.000 | 5.000 | 60.000 | 70.000 | na
RiserM_Temp1 | na | degrees C | na | na | 0.000 | 5.000 | 60.000 | 70.000 | na
RiserM_Temp2 | na | degrees C | na | na | 0.000 | 5.000 | 60.000 | 70.000 | na
RiserM_Temp3 | na | degrees C | na | na | 0.000 | 5.000 | 60.000 | 70.000 | na
RiserM_Temp4 | na | degrees C | na | na | 0.000 | 5.000 | 60.000 | 70.000 | na
RiserR_Temp1 | na | degrees C | na | na | 0.000 | 5.000 | 60.000 | 70.000 | na
RiserR_Temp2 | na | degrees C | na | na | 0.000 | 5.000 | 60.000 | 70.000 | na
RiserR_Temp3 | na | degrees C | na | na | 0.000 | 5.000 | 60.000 | 70.000 | na
RiserR_Temp4 | na | degrees C | na | na | 0.000 | 5.000 | 60.000 | 70.000 | na
CPU0_Temp | na | degrees C | na | na | na | na | 90.000 | 100.000 | na
CPU1_Temp | na | degrees C | na | na | na | na | 90.000 | 100.000 | na
Mezz_Temp1 | na | degrees C | na | na | 0.000 | 5.000 | 80.000 | 85.000 | na
Mezz_Temp2 | na | degrees C | na | na | 0.000 | 5.000 | 54.000 | 57.000 | na
- 複数の温度センサーが次の状態で「NC」を表示
sel elist
:
02/17/2022 | 16:57:53 | Temperature LED2_Temp | Lower Non-critical going low | Reading 4 < Threshold 3 degrees C
02/17/2022 | 16:58:00 | Temperature LED1_Temp | Lower Non-critical going low | Reading 4 < Threshold 3 degrees C
02/17/2022 | 17:48:22 | Temperature MP_Temp3 | Lower Non-critical going low | Reading 5 < Threshold 5 degrees C
02/17/2022 | 17:49:40 | Temperature System_Inlet | Lower Non-critical going low | Reading 5 < Threshold 5 degrees C
02/17/2022 | 17:49:55 | Temperature MP_Temp1 | Lower Non-critical going low | Reading 5 < Threshold 5 degrees C
02/17/2022 | 17:50:09 | Temperature MP_Temp1 | Lower Non-critical going low | Reading 6 < Threshold 5 degrees C
02/17/2022 | 17:50:15 | Temperature MP_Temp1 | Lower Non-critical going low | Reading 5 < Threshold 5 degrees C
02/17/2022 | 17:50:40 | Temperature MP_Temp1 | Lower Non-critical going low | Reading 6 < Threshold 5 degrees C
- リブート前に表示される内容の例を次に示します。
Nov 11, 2020 07:00:41 0100 HA Group Notification (CHASSIS POWER DEGRADED: Power Supply Status Critical: PSU1, PSU2.) ERROR
Nov 11, 2020 07:34:52 0100 HA Group Notification (CHASSIS OVER TEMPERATURE SHUTDOWN) EMERGENCY
Nov 11, 2020 10:16:20 0100 HA Group Notification (BATTERY ('Bat Temp' unreadable)) EMERGENCY
Nov 11, 2020 10:16:55 0100 HA Group Notification (BATTERY ('Bat Volt' unreadable)) EMERGENCY
Nov 11, 2020 10:17:07 0100 HA Group Notification (BATTERY ('Bat Curr' unreadable)) EMERGENCY
Nov 11, 2020 10:17:18 0100 HA Group Notification (BATTERY ('Bat Full Cap' unreadable)) EMERGENCY
Nov 11, 2020 10:17:40 0100 HA Group Notification (CHASSIS FAN FRU FAILED: Fan1_1) ERROR
Nov 11, 2020 10:17:54 0100 HA Group Notification (CHASSIS FAN FRU FAILED: Fan2_1) ERROR
Nov 11, 2020 10:18:08 0100 HA Group Notification (CHASSIS FAN FRU FAILED: Fan2_2) ERROR
Nov 11, 2020 10:18:19 0100 HA Group Notification (CHASSIS FAN FRU FAILED: Fan3_1) ERROR
Nov 11, 2020 10:18:30 0100 HA Group Notification (CHASSIS FAN FRU FAILED: Fan3_2) ERROR