メインコンテンツへスキップ

障害のあるDIMMによって引き起こされたECCエラーによるノードパニック

Views:
81
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
HW
Last Updated:

環境

  • ONTAP 9
  • FASシステム
  • AFFシステム

問題

  • クラスタアラート後にノードが突然リブートした
HA Group Notification from Node13 (NODE(S) OUT OF CLUSTER QUORUM) EMERGENCY HA Group Notification (PARTNER REBOOT (CONTROLLER TAKEOVER)) NOTICE

       一方、EMS ログには次のように表示されます  

ECC error at DIMM-7: 2C-02-1909-20F18D8D,ADDR 0x208455e900,(Node(0), Memory controller(1), CH(3), DIMM(0), Rank(1), Bank Group(0), Bank(0x1), Row(0x10045), Col(0x1d0)) SKL_IMC1 Error: Fri Dec 20 16:26:31 2024 SRAM record type(CPU) from Data ONTAP: socket(0) core(4) bank(8) Fri Dec 20 16:26:31 2024 SRAM record type(LOG) from Data ONTAP: UECC Addr 0x208455e900 Fri Dec 20 16:26:31 2024 SRAM record type(DIMM) from Data ONTAP: slot(7)
  • ノードのブートが失敗し、次のパニック文字列が表示されることがあります:
 
PANIC: ECC error at DIMM-2: CE-03-2040-176B3357,ADDR 0x558b31e40,(Node(0), Memory controller(0), CH(1), DIMM(0), Rank(0), Bank Group(3), Bank(0x3), Row(0x9633), Col(0xf8)) Uncorrectable Machine Check Error at CPU9. BDWL_HA0 Error: STATUS<0xbe00000000010091>(Val,UnCor,Enable,MiscV,AddrV,PCC,CorrSts(0),CorrCnt(0),ExtErr(0x1),ErrCode(Channel 1, Read)ErrCode(0x91))MISC<0x000000044056d686>(HaDbBank(0),PE(0),ReqOpcode(0x22),RNID(0),RTID(0x2b),HTID(0x6b))ADDR<0x0000000558b31e40>((0x558b31e40)).  in process idle: cpu9 on release 9.7P10 (C) on Sun Nov 13 00:57:56 IST 2022
  • BMC events all レポート DIMM トラップ:

Record 1382: Tue Oct 21 10:00:02.423402 2025 [IPMI Event.critical]: DIMM UECC Fatal Error detected by Storage OS
Record 1383: Tue Oct 21 10:00:02.463052 2025 [Trap Event.critical]: hwassist dimm_uecc_error (32)

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.