メインコンテンツへスキップ

AFF A250/C250 HIC2 Temp0で障害が発生しました

Views:
7
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
hw
Last Updated:

環境

  • AFF A250
  • AFF C250
  • X1152

問題

  • ONTAPのアップグレード後または通常の動作中に、ノードがシャーシの温度が高すぎると報告する
[Node-01:monitor.globalStatus.critical:EMERGENCY]: Chassis temperature is too high..
  • ノードがパニック状態になり、リブートして waiting for giveback 状態になると同時に NIC センサー エラーが報告されることがあります。

PANIC: Uncorrectable Machine Check Error at CPU10. SKL_IIO Error: STATUS<0xbb80000000000e0b>(VALID,UC,EN,MISCV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MSCOD(0),MCACOD(0xe0b))MISC<0x0000000064000000>(UCR_BUS_LOG(100),UCR_DEVICE_LOG(0),UCR_FUNCTION_LOG(0),UCR_SEGMENT_LOG(0))IIO Machine Check from device(s):RPT(100,0,0):ErrSrcID(CorrSrc(0x6670),UCorrSrc(0x66a0)), PLX PCIE 9797 switch on Controller, Br[9797](102,20,0): Link down, PLX PCIE 9797 switch on Controller, Br[9797](102,21,0): Link down. ,.  in process idle: cpu10 on release 9.13.1P6 (C)

Waiting for giveback...(Press Ctrl-C to abort wait)
Jul 04 10:24:42 [node1:monitor.temp.unreadable:error]: The controller temperature (HIC2 Temp0) is not readable.
Jul 04 10:24:42 [node1:monitor.temp.unreadable:error]: The controller temperature (HIC2 Temp1) is not readable.
Jul 04 10:26:12 [node1:callhome.chassis.hitemp:error]: Call home for CHASSIS OVER TEMPERATURE

  • SP-LATEST-IPMI 読み取り不能なセンサーを表示

ワーキングカード:

HIC1_TEMP0       | 55.000     | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
HIC1_TEMP1       | 57.000     | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000

 
 
障害カード TEMP0 で障害あり:
HIC2_TEMP0       | na         | degrees C  | na    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
HIC2_TEMP1       | 53.000     | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
 
障害カード TEMP1 で障害あり:
HIC2_TEMP0    | 52.000    | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
HIC2_TEMP1    | na      | degrees C  | na    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
  • SPイベント ログに、スロット2のNICで速度が低下したことが記録される
617 | 02/13/2024 | 19:42:41 | Temperature #0x10 | Lower Non-recoverable going low
618 | OEM record ee | Device Bus: 117 Dev: 0 Fun: 0 (slot 2) Failed to train at max link speed/width, retraining cycle 0
- Expected GEN1, actual GEN1
- Expected x16, actual x8
  • SYSCONFIG-A の NIC にポートがありません

slot 2: Quad 10G/25G Ethernet Controller CX5-Mezz
  e2a MAC Address:    d0:39:ea:52:c8:5f (auto-unknown-fd-down)
  e2b MAC Address:    d0:39:ea:52:c8:60 (auto-unknown-fd-down)
  Device Type:        CX5 PSID(NAP0000000014)
  Firmware Version:   16.26.4012
  Part Number:        111-04587
  Hardware Revision:  B0
  Serial Number:      032249003452

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.