メインコンテンツへスキップ

CPU 0 のスレッド (ontap: cpu0) が 300001 ミリ秒間ハングしました

Views:
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

環境

  • ONTAP 9
  • FAS / AFFプラットフォーム

問題

からSP-Console-log以下のログが表示されます。

  • ノードパニックに陥り、次のようになりました。
PANIC  : thread (ontap: cpu0) on cpu 0 hung for 300001 milliseconds
version: 9.14.1P11: Wed Jan 22 06:55:28 EST 2025
conf  : x86_64.optimize
cpuid = 0
KDB: stack backtrace:
vpanic() at vpanic+0x602/frame 0xfffffe00935fcdb0
panic() at panic+0x42/frame 0xfffffe00935fce10
check_starvation_internal() at check_starvation_internal+0xb5/frame 0xfffffe00935fce40
hardclock() at hardclock+0x45/frame 0xfffffe00935fce90
resumectx() at resumectx+0x427/frame 0xfffffe00935fcef0
lapic_handle_timer() at lapic_handle_timer+0xa2/frame 0xfffffe00935fcf20
Xtimerint() at Xtimerint+0x128/frame 0xfffffe00935fcf20
--- interrupt, rip = 0xffffffff80d3cd91, rsp = 0xfffffe01a6945a70, rbp = 0xfffffe01a6945a70 ---
bzero_sse2_nt() at bzero_sse2_nt+0x51/frame 0xfffffe01a6945a70
vm_hw_module_init() at vm_hw_module_init+0x6c1/frame 0xfffffe01a6945b40
sk_init_mem() at sk_init_mem+0x39/frame 0xfffffe01a6945b80
startup_boot_processor() at startup_boot_processor+0x56/frame 0xfffffe01a6945b90
psm_processor_start() at psm_processor_start+0x2e/frame 0xfffffe01a6945bb0
fork_exit() at fork_exit+0xb6/frame 0xfffffe01a6945bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01a6945bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 7m24s
ahcich0: AHCI reset done: devices=00000001
Dumper is not yet registered. A coredump will not be available at this time.
System halting...
  • Pelog もダンプされる可能性があります:
Found platform error in ONTAP region.
region 2 header
Sig(0x544f414e)
Size(28672)
Ver(1)
Tail(0)
DataLen(100)
DataCRC(0x6edc)
HdrCRC(0xe223)
Rec(1) @ 0x14, flag(0x0) len(24) tstamp(0x6832f1ea)
log(7) msg(UECC Addr 0x18da78340)
Rec(2) @ 0x38, flag(0x0) len(8) tstamp(0x6832f1ea)
node(0) chan(3) dimm(1)
rank(1) bank(0x2) row(0x864) col(0x308)
Rec(1) @ 0x4c, flag(0x0) len(32) tstamp(0x6832f1ea)
   log(7) msg(devtag(0x2), correrr(0x50ba))
  • ECC エラーが表示されます:
ECC error at DIMM-12: CE-04-2002-42C12E13,ADDR 0x18da78340,(Node(0), Memory controller(1), CH(3), DIMM(1), Rank(1), Bank Group(3), Bank(0x2), Row(0x864), Col(0x308)), devtag(0x2), correrr(0x50ba) Uncorrectable Machine Check Error at CPU3. BDWL_HA1 Error: STATUS<0xfe0003c000010091>(Val,OverF,UnCor,Enable,MiscV,AddrV,PCC,CorrSts(0),CorrCnt(0xf),ExtErr(0x1),ErrCode(Channel 1, Read),ErrCode(0x91)),MISC<0x0000000140560e86>(HaDbBank(0),PE(0),ReqOpcode(0xa),RNID(0),RTID(0x2b),HTID(0x7))

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.