メインコンテンツまでスキップ

1つのノードでDIMMのパフォーマンスが低下し、CPU使用率が高い

環境

  • ONTAP 9
  • AFF A400

問題

  • CPUが高いと、1つのノードのパフォーマンスが低下します。
  • データアグリゲートでの書き込みレイテンシが高くなります。

Time         Node        Severity    Event
------------------- ---------------- ------------- ---------------------------
7/24/2023 18:33:25  node_name     ERROR      wafl.cp.toolong: Aggregate aggr_name experienced a long CP.
7/24/2023 18:15:22  node_name     ERROR      wafl.cp.toolong: Aggregate aggr_name experienced a long CP.

  • パニック時にコアダンプファイルが生成され、ノードがリブートします。

"process on cpu17 hung (telnet_0) for 5001 milliseconds! in SK process telnet_0 on release 9.10.1P12 (C"

  • DIMMモジュールの修正可能なエラー。

Number of correctable ECC since boot 60362216: Information about Correctable ECC: ECC error at DIMM-xx: CE-03-2106-18AEE039,ADDR 0x5959b3100,(Node(1), Memory controller(0), CH(0), DIMM(0), Rank(0), Bank Group(2), Bank(0x0), Row(0x52ad), Col(0x2c0))
Correctable Machine Check Error at CPU17 McBank7. SKL_IMC0 Error: STATUS<0xcc10000001010090> (...)

Number of correctable ECC since boot 60427752: Information about Correctable ECC: ECC error at DIMM-xx: CE-03-2106-18AEE039,ADDR 0x8698e9d00,(Node(1), Memory controller(0), CH(0), DIMM(0), Rank(1), Bank Group(0), Bank(0x0), Row(0x7d3f), Col(0x70))
Correctable Machine Check Error at CPU13 McBank7. SKL_IMC0 Error: STATUS<0xcc10000001010090> (...)

  • そのDIMMに対してメモリエラーアラートがトリガーされました。

[node_name: mgwd: callhome.hm.alert.critical:debug]: Call home for Health Monitor process nphm: CriticalCECCCountMemErrAlert[DIMM-xx].

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.