ノードは「Battery PCT capacity critical」が原因でシャットダウンしますが、バッテリセンサには問題ありません
環境
- FAS2620
- FAS2720
- FAS2750
- Service Processor(SP)5.9
問題
- ノードがシャットダウンし、
system logに以下のエラーが報告されます。
Nov 11 06:54:26 [Node-02:nvmem.battery.fccLowCrit:EMERGENCY]: The NVMEM battery full-charge capacity is critically low (30 ). To prevent data loss, the system will shut down in 20 minutes.
Nov 11 06:55:24 [Node-02:callhome.battery.failure:EMERGENCY]: Call home for BATTERY (full charge capacity low) CRITICAL.
Nov 11 07:14:26 [Node-02:monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (Battery PCT capacity critical)
Waiting for PIDS: 858.
.
Uptime: 57d5h49m57s
System powering down...
- 以下の現象は、
ipmi_sensorsのsp status -dで発生します。- バッテリーに関連する各項目の状態は良好です。
ipmi_sensors
============
Sensor Name | Current | Unit | Status | LCR | LNC | UNC | UCR
-----------------+------------+------------+------------+-----------+-----------+-----------+-----------
Bat_Temp | 29.000 | degrees C | ok | 0.000 | 5.000 | 60.000 | 75.000
Bat_Volt | 8.100 | Volts | ok | 5.500 | 5.600 | 8.500 | 8.600
Bat_Curr | 0.000 | Amps | ok | na | na | 1.200 | 1.520
Bat_Rem_Cap | 0.512 | Amps * hour | ok | na | na | na | na
Bat_Full_Cap | 0.512 | Amps * hour | ok | na | na | na | na
Bat_Charge_Curr | 0.000 | Amps | ok | na | na | 2.200 | 2.300
Bat_Charge_Volt | 8.200 | Volts | ok | na | na | 8.900 | 9.000
Bat_Design_Cap | 2.100 | Amps * hour | ok | na | na | na | na
Bat_Initial_Fcc | 1.700 | Amps * hour | ok | na | na | na | na
Bat_Dstg_Cycles | 10.000 | cycles | ok | 2.000 | 5.000 | na | na
Bat_Power_Fault | 0x0 | discrete | Deasserted | na | na | na | na
Bat_Dsg_FET_Flt | 0x0 | discrete | Deasserted | na | na | na | na
Bat_Chg_FET_Flt | 0x0 | discrete | Deasserted | na | na | na | na
Bat_Pack_Invalid | 0x0 | discrete | Deasserted | na | na | na | na
Bat_Cycle_Cnt | 37.000 | cycles | ok | na | na | na | na
Bat_Lrn_Active | 0x0 | discrete | Deasserted | na | na | na | na - バッテリー以外の複数のセンサーの値は
naと表示されています。ipmi_sensors
============
Sensor Name | Current | Unit | Status | LCR | LNC | UNC | UCR
-----------------+------------+------------+------------+-----------+-----------+-----------+-----------
CPU0_Temp_Margin | na | degrees C | na | na | na | -11.000 | -1.000
CPU0_Core_Temp | na | degrees C | na | 0.000 | 5.000 | 90.000 | 100.000
P5V | na | Volts | na | 4.246 | 4.343 | 5.661 | 5.807
P3V3 | na | Volts | na | 2.960 | 3.040 | 3.568 | 3.664
PVDDQ_DDR4_AB | na | Volts | na | 0.010 | 0.019 | 2.454 | 2.464
PVTT_DDR4_AB | na | Volts | na | 0.010 | 0.019 | 2.454 | 2.464
PVCCP_CPU0 | na | Volts | na | 0.010 | 0.019 | 2.454 | 2.464
P3V3_BATT | na | Volts | na | 0.016 | 0.032 | 4.048 | 4.064
P12V | na | Volts | na | 0.000 | 0.000 | 15.810 | 15.810
P12V_Curr | na | Amps | na | 0.000 | 0.000 | 15.810 | 15.810
- バッテリーに関連する各項目の状態は良好です。
- シャットダウンは、
events allのアクティブラーニングサイクル中には発生しません。
Record 2395: Wed Oct 16 12:00:14 2024 [IPMI.notice]: 2003 | 02 | EVT: 0301ffff | Bat_Lrn_Active | Assertion Event, "State Asserted"
Record 2396: Wed Oct 16 13:13:09 2024 [IPMI.notice]: 2103 | 02 | EVT: 0301ffff | Attn_Sensor1 | Assertion Event, "State Asserted"
Record 2397: Wed Oct 16 13:13:17 2024 [IPMI.notice]: 2203 | 02 | EVT: 0300ffff | Attn_Sensor1 | Assertion Event, "State Deasserted"
Record 2398: Wed Oct 16 13:24:07 2024 [IPMI.notice]: 2303 | 02 | EVT: 0301ffff | Attn_Sensor1 | Assertion Event, "State Asserted"
Record 2399: Wed Oct 16 13:24:13 2024 [IPMI.notice]: 2403 | 02 | EVT: 0300ffff | Attn_Sensor1 | Assertion Event, "State Deasserted"
Record 2400: Wed Oct 16 16:02:02 2024 [IPMI.notice]: 2503 | 02 | EVT: 0300ffff | Bat_Lrn_Active | Assertion Event, "State Deasserted"
Record 2401: Wed Oct 16 19:17:00 2024 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Wed Oct 16 19:16:58 2024. New time: Wed Oct 16 19:17:00 2024.
Record 2402: Mon Oct 21 08:45:00 2024 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Mon Oct 21 08:44:58 2024. New time: Mon Oct 21 08:45:00 2024.
Record 2403: Fri Oct 25 22:21:00 2024 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Fri Oct 25 22:20:58 2024. New time: Fri Oct 25 22:21:00 2024.
Record 2404: Wed Oct 30 10:28:00 2024 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Wed Oct 30 10:27:58 2024. New time: Wed Oct 30 10:28:00 2024.
Record 2405: Mon Nov 4 01:42:00 2024 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Mon Nov 4 01:41:58 2024. New time: Mon Nov 4 01:42:00 2024.
Record 2406: Fri Nov 8 16:59:00 2024 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old time: Fri Nov 8 16:58:58 2024. New time: Fri Nov 8 16:59:00 2024.
Record 2407: Sun Nov 10 21:54:25 2024 [IPMI.notice]: 2603 | 02 | EVT: 0301ffff | Attn_Sensor1 | Assertion Event, "State Asserted"
Record 2408: Sun Nov 10 21:54:32 2024 [IPMI.notice]: 2703 | 02 | EVT: 0300ffff | Attn_Sensor1 | Assertion Event, "State Deasserted"
Record 2409: Sun Nov 10 21:54:53 2024 [IPMI.notice]: 2803 | 02 | EVT: 0301ffff | Attn_Sensor1 | Assertion Event, "State Asserted"
Record 2410: Sun Nov 10 21:54:59 2024 [IPMI.notice]: 2903 | 02 | EVT: 0300ffff | Attn_Sensor1 | Assertion Event, "State Deasserted"
Record 2411: Sun Nov 10 22:14:23 2024 [IPMI.emergency]: triggered OS halt: Battery PCT capacity critic
Record 2412: Sun Nov 10 22:14:25 2024 [IPMI.notice]: 2a03 | 02 | EVT: 0301ffff | Attn_Sensor1 | Assertion Event, "State Asserted"
Record 2413: Sun Nov 10 22:14:31 2024 [IPMI.notice]: 2b03 | 02 | EVT: 0300ffff | Attn_Sensor1 | Assertion Event, "State Deasserted"
Record 2414: Sun Nov 10 22:14:37 2024 [IPMI.notice]: 2c03 | 02 | EVT: 0301ffff | Attn_Sensor1 | Assertion Event, "State Asserted"
Record 2415: Sun Nov 10 22:14:50 2024 [IPMI.notice]: 2d03 | 02 | EVT: 6f406fff | Sensor 255 | Assertion Event, "Storage OS stop/shutdown"
Record 2416: Sun Nov 10 22:14:50 2024 [IPMI Event.critical]: System power down
Record 2417: Sun Nov 10 22:14:50 2024 [ONTAP.notice]: Appliance user command halt.
Record 2418: Sun Nov 10 22:14:50 2024 [IPMI.emergency]: Shutdown by Data ONTAP
Record 2419: Sun Nov 10 22:14:50 2024 [Trap Event.critical]: hwassist abnormal_reboot (28)
Record 2420: Sun Nov 10 22:14:53 2024 [IPMI.notice]: 2e03 | 02 | EVT: 0300ffff | Power_Good | Assertion Event, "State Deasserted"
Record 2421: Sun Nov 10 22:15:09 2024 [ASUP.notice]: First notification email | (REBOOT (abnormal)) WARNING | Sent
Record 2422: Sun Nov 10 22:26:09 2024 [SP.critical]: Heartbeat stopped
Record 2423: Sun Nov 10 22:29:52 2024 [ASUP.notice]: Reminder email | (REBOOT (abnormal)) WARNING | Sent