node_nameからのHAグループの通知(CONTROLLER TAKEOVER COMPLETE AUTOMATIC - INTERNAL HALT)アラート
環境
- 予期せぬテイクオーバーイベント
- 「
INTERNAL HALT
」によるケースの自動アラート
- 「
問題
- 「
INTERNAL HALT
」のAutoSupportアラートメッセージを含む自動ケース。例:
HA Group Notification from node_name (CONTROLLER TAKEOVER COMPLETE AUTOMATIC - INTERNAL HALT) ALERT
- 複数のファンに障害が発生した場合に環境がシャットダウンされたため、内部停止が開始されました。例:
... [node_name: env_mgr: monitor.chassisFan.stop:error]: Chassis fan contains at least one stopped fan: Sysfan1 F1 Speed (failed)
... [node_name: env_mgr: monitor.chassisFan.stop:error]: Chassis fan contains at least one stopped fan: Sysfan1 F2 Speed (failed)
... [node_name: env_mgr: monitor.chassisFan.stop:error]: Chassis fan contains at least one stopped fan: Sysfan2 F1 Speed (failed)
... [node_name: env_mgr: monitor.chassisFan.stop:error]: Chassis fan contains at least one stopped fan: Sysfan2 F2 Speed (failed)
... [node_name: env_mgr: monitor.chassisFan.stop:error]: Chassis fan contains at least one stopped fan: Sysfan3 F1 Speed (failed)
... [node_name: env_mgr: monitor.chassisFan.stop:error]: Chassis fan contains at least one stopped fan: Sysfan3 F2 Speed (failed)
... [node_name: env_mgr: monitor.chassisFan.stop:error]: Chassis fan contains at least one stopped fan: Sysfan4 F1 Speed (failed)
... [node_name: env_mgr: monitor.chassisFan.stop:error]: Chassis fan contains at least one stopped fan: Sysfan4 F2 Speed (failed)
... [node_name: env_mgr: monitor.chassisFan.stop:error]: Chassis fan contains at least one stopped fan: Sysfan5 F1 Speed (failed)
... [node_name: env_mgr: monitor.chassisFan.stop:error]: Chassis fan contains at least one stopped fan: Sysfan5 F2 Speed (failed)
... [node_name: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Multiple chassis fans have failed.
... [node_name: statd: monitor.fan.failed:alert]: Multiple fans has failed: Sysfan1 F1, Sysfan1 F2, Sysfan2 F1, Sysfan2 F2, Sysfan3 F1, Sysfan3 F2, Sysfan4 F1, Sysfan4 F2, Sysfan5 F1, Sysfan5 F2.
... [node_name: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed: Sysfan1 F1, Sysfan1 F2, Sysfan2 F1, Sysfan2 F2, Sysfan3 F1, Sysfan3 F2, Sysfan4 F1, Sysfan4 F2, Sysfan5 F1, Sysfan5 F2. Power Supply Status Critical: PSU1.
... [node_name: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (Multiple fans failed)
... [node_name: shutdown_thread0: ha.localNodeShutDown:notice]: Shutdown of the local node has been initiated with inhibit_takeover set to FALSE.
- ONTAPイベントレポート
Chassis power is degraded: Power Supply Status Critical
。例:
::> event log show
... node EMERGENCY monitor.globalStatus.critical: Power Supply Status Critical: PSU1.
... node ERROR callhome.chassis.power: Call home for CHASSIS POWER DEGRADED: Power Supply Status Critical: PSU1.
... node ALERT monitor.chassisPower.degraded: Chassis power is degraded: Power Supply Status Critical: PSU1.
... node ERROR callhome.chassis.ps.degraded: Call home for CHASSIS POWER SUPPLY DEGRADED: PS 1
- 電源コード とPSUをシャーシに装着し直しても問題は残ります
- PSUを別の正常なものと交換すると、問題は PSUに従って発生します。
- SP/BMCが PSUのイベントを報告する場合があります。例:
BMC node_name> events all
...
Record 1303: Thu Sep 04 15:01:00.484212 2025 [IPMI.notice]: 0008 | 02 | EVT: 6f02ffff | PSU1_Status | Assertion Event, "Attention"