メインコンテンツまでスキップ

CriticalCECountMemErrAlertが原因でコントローラのステータスがデグレードになります

Views:
22
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

環境

  • FAS9000
  • FAS8300

問題

  1. コントローラのステータスがデグレードと表示される状況は次のとおり
cluster::> system controller show
Controller Name       System ID    Serial Number     Model   Status
------------------------- ------------- -----------------   -------- -----------
cluster-01         512345785   701234567892     FAS9000 degraded
cluster-02         512345787   701234567893     FAS9000  ok
2entries were displayed.
  1. 次のヘルスアラートがログに記録されます。
cluster::> system health alert show
Node: cluster-01
Resource: DIMM-15
Severity: Critical
Indication Time: Thu Aug 26 04:20:47 2021
Suppress: false
Acknowledge: false
Probable Cause: The DIMM has degraded, leading to memory errors.
Possible Effect: Memory issues can lead to a catastrophic system panic,
which can lead to data downtime on the node.
Corrective Actions: 1. Contact technical support to obtain a new DIMM of the same specification.
2. If possible, perform a takeover of this node and bring the node down for maintenance.
3. Refer to the DIMM replacement guide for your given hardware platform to replace the DIMM.
4. Bring the storage system online.
 
cluster::> system health alert show -instance
Node: cluster-01
Monitor: controller
Alert ID: CriticalCECCCountMemErrAlert
Alerting Resource: DIMM-15
Subsystem: Memory
Indication Time: Thu Aug 26 04:20:47 2021
Perceived Severity: Critical
Probable Cause: DIMM_Degraded
Description: The DIMM has degraded, leading to memory errors.
Corrective Actions: 1. Contact technical support to obtain a new DIMM of the same specification.
2. If possible, perform a takeover of this node and bring the node down for maintenance.
3. Refer to the DIMM replacement guide for your given hardware platform to replace the DIMM.
4. Bring the storage system online.
Possible Effect: Memory issues can lead to a catastrophic system panic, which can lead to data downtime on the node.
Acknowledge: false
Suppress: false
Policy: CriticalCECCCountMemErrAlertPolicy
Acknowledger: -
Suppressor: -
Additional Information: Slot Name: DIMM-15
CPU Socket: 0
Channel: 0
Slot number on a channel: 1
Correctable ECC error count: 1075
Uncorrectable ECC error count: 0
Correctable ECC error Limit: 500
Node Serial Number: 701234567892
Node Model: FAS9000
Alerting Resource Name: DIMM-15
Additional Alert Tags: device

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.

 

  • この記事は役に立ちましたか?