メインコンテンツへスキップ

1つのDIMMの障害が原因で複数のDIMMが無効になり、ONTAPが少ないメモリでブート

Views:
81
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

環境

  • AFF A1K、AFF A90、AFF A70、AFF C80、AFX 1K
  • ASA A1K、ASA A90、ASA A70
  • FAS90、FAS70
  • AFF A900、ASA A900
  • FAS9500
  • platform.reducedMemory イベント

問題

  • "BootDimmDisableAlert"がsystem health alert showの複数のDIMMに対して報告されました

Node: node-01
Alert ID: BootDimmDisableAlert
Resource: DIMM-10
Severity: Major
Indication Time: XXX XXX XX XX:XX:XX XXXX

Suppress: false
Acknowledge: false
Probable Cause: "DIMM-10" has been disabled to preserve memory
interleaving. The system has booted with less memory.
Possible Effect: System memory has been reduced, which can impact
performance of the node.
Corrective Actions: Repair failed or mapped-out DIMMs in the system. Then,
reset the system to re-enable disabled DIMMs.

  • ECC エラー、その他の POST メモリ エラーおよびplatform.reducedMemoryイベントは、シリアル コンソール ログで確認できます。

login: ECC error at DIMM-23: 2204-0278F989,ADDR 0x6296b70340,(Node(0), memory controller(3), CH(6), DIMM(0), Rank(1), Bank Group(2), Bank(0x3), Row(0x1885e), Col(0x208)) Uncorrectable Machine Check Error at CPU29. ICL_IMC3_C0 Error: STATUS<0xbe00ffc2001000c0>(VALID,UC,EN,MISCV,ADDRV,PCC,CORR_ERR_STAT(0),CORR_ERR_CNT(0x3ff),OTHER_INFO(0x2),MSCOD(0x10),MCACOD(0xc0)),MISC<0x09000e0c42f10486>(EXTRA_ERR_INFO(0x4800706217882),ADDR_MODE(0x2),REC_ERR_LSB(0x6)),ADDR<0x0000006296b70340>(ADDRESS(0x6296b70340))Node(0), Memory controller(3), CH(0), DIMM(0), Rank(1), Bank Group(2), Bank(0x3), Row(0x1885e), Col(0x208),

MEMORY WARNING: Major Code 0x30 Minor Code 0x2D Dimm 23
MEMORY WARNING: Major Code 0x30 Minor Code 0x2D Dimm 23
MEMORY WARNING: Major Code 0x30 Minor Code 0x2D Dimm 23
MEMORY WARNING: Major Code 0x30 Minor Code 0x2D Dimm 23
MEMORY WARNING: Major Code 0x30 Minor Code 0x2D Dimm 23
MEMORY WARNING: Major Code 0x30 Minor Code 0x2D Dimm 23
MEMORY WARNING: Major Code 0x30 Minor Code 0x2D Dimm 23
MEMORY WARNING: Major Code 0x30 Minor Code 0x2D Dimm 23
MEMORY WARNING: Major Code 0x30 Minor Code 0x2D Dimm 23
DIMM:23 mapped out. BIOS MRC mapped out DIMM. Major / Minor Error Code: 0x0B / 0x1C
Complete channel mapped out.
Or 
DIMM:23 mapped out. BIOS MRC mapped out DIMM. Major / Minor Error Code: 0x30 / 0x2D
Complete channel mapped out.

DIMM in slot 3 is disabled
DIMM in slot 7 is disabled
DIMM in slot 10 is disabled
DIMM in slot 14 is disabled
DIMM in slot 19 is disabled
DIMM in slot 23 failed
DIMM in slot 26 is disabled
DIMM in slot 30 is disabled
Jan 23 23:58:06 [node-01:platform.reducedMemory:ALERT]: System memory (511 GB) is less than expected (1024 GB). Check DIMMs slots 3, 7, 10, 14, 19, 23, 26, 30.

  • ASUPのDIMM-INFO.XML logでは、出力は以下のようになります

例(AFF-C80):
2 DIMM-2 0 2 0 Samsung-13ABFD3C M321R4GA3BB6-CQKET 0 0 controller unknown DIMM-2 DIMM-2 bucket false disabled
4 DIMM-4 0 0 0 Samsung-13ABD70D M321R4GA3BB6-CQKET 0 0 controller ok DIMM-4 DIMM-4 bucket false none
5 DIMM-5 0 4 0 Samsung-13AC01D4 M321R4GA3BB6-CQKET 0 0 controller unknown DIMM-5 DIMM-5 bucket false disabled
7 DIMM-7 0 6 0 Samsung-13ABFD3D M321R4GA3BB6-CQKET 0 0 controller ok DIMM-7 DIMM-7 bucket false none
10 DIMM-10 1 2 0 Samsung-13AC0269 M321R4GA3BB6-CQKET 0 0 controller unknown DIMM-10 DIMM-10 bucket false fault
12 DIMM-12 1 0 0 Samsung-13AC0232 M321R4GA3BB6-CQKET 0 0 controller ok DIMM-12 DIMM-12 bucket false none
13 DIMM-13 1 4 0 Samsung-13ABF48B M321R4GA3BB6-CQKET 0 0 controller unknown DIMM-13 DIMM-13 bucket false disabled
15 DIMM-15 1 6 0 Samsung-13ABFCBA M321R4GA3BB6-CQKET 0 0 controller ok DIMM-15 DIMM-15 bucket false none

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.

 

  • この記事は役に立ちましたか?