メインコンテンツへスキップ

障害のあるDIMMによって引き起こされたECCエラーによるノードパニック

Views:
88
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

環境

  • ONTAP 9
  • FASシステム
  • AFFシステム

問題

  • クラスタアラートの後にノードが突然再起動した
HA Group Notification from Node13 (NODE(S) OUT OF CLUSTER QUORUM) EMERGENCY HA Group Notification (PARTNER REBOOT (CONTROLLER TAKEOVER)) NOTICE

       一方、EMS ログには次のように表示されます  

ECC error at DIMM-7: 2C-02-1909-20F18D8D,ADDR 0x208455e900,(Node(0), Memory controller(1), CH(3), DIMM(0), Rank(1), Bank Group(0), Bank(0x1), Row(0x10045), Col(0x1d0)) SKL_IMC1 Error: Fri Dec 20 16:26:31 2024 SRAM record type(CPU) from Data ONTAP: socket(0) core(4) bank(8) Fri Dec 20 16:26:31 2024 SRAM record type(LOG) from Data ONTAP: UECC Addr 0x208455e900 Fri Dec 20 16:26:31 2024 SRAM record type(DIMM) from Data ONTAP: slot(7)
  • 場合によっては、次のパニック文字列でノードの起動に失敗することがあります:
 
PANIC: ECC error at DIMM-2: CE-03-2040-176B3357,ADDR 0x558b31e40,(Node(0), Memory controller(0), CH(1), DIMM(0), Rank(0), Bank Group(3), Bank(0x3), Row(0x9633), Col(0xf8)) Uncorrectable Machine Check Error at CPU9. BDWL_HA0 Error: STATUS<0xbe00000000010091>(Val,UnCor,Enable,MiscV,AddrV,PCC,CorrSts(0),CorrCnt(0),ExtErr(0x1),ErrCode(Channel 1, Read)ErrCode(0x91))MISC<0x000000044056d686>(HaDbBank(0),PE(0),ReqOpcode(0x22),RNID(0),RTID(0x2b),HTID(0x6b))ADDR<0x0000000558b31e40>((0x558b31e40)).  in process idle: cpu9 on release 9.7P10 (C) on Sun Nov 13 00:57:56 IST 2022
  • BMC events all レポート DIMM トラップ:

Record 1382: Tue Oct 21 10:00:02.423402 2025 [IPMI Event.critical]: DIMM UECC Fatal Error detected by Storage OS
Record 1383: Tue Oct 21 10:00:02.463052 2025 [Trap Event.critical]: hwassist dimm_uecc_error (32)

    Sign in to view the entire content of this KB article.

    New to NetApp?

    Learn more about our award-winning Support

    NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.