メインコンテンツへスキップ

CX6 NIC X91153Aのリンクリセットメッセージが繰り返される

Views:
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

環境

  • AFF-A900
  • ONTAP 9
  • CX6 PSIDカード

問題

  • 2024年6月30日以降、スロット2のノードnode-01でLink Resetting メッセージが繰り返し発生しています
SYSCONFIG -A
slot 2: Dual 40G/100G/200G Ethernet Controller CX6
 
SYSCONFIG -AC
sysconfig: slot 2 OK: X91153A: 2p 40G/100G RoCE QSFP28
 
EMS
(2024年6月)
Sun Jun 30 00:17:51 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2a(pci0:51:0:0) has generated a register dump in /mroot/etc/mlx5log : Link Resetting.
Sun Jun 30 00:17:51 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2a(pci0:51:0:0) failed to generate a register dump with error = 17 : Link Resetting.
Sun Jun 30 00:17:51 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2b(pci0:51:0:1) has generated a register dump in /mroot/etc/mlx5log : Link Resetting.
Sun Jun 30 00:17:51 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2b(pci0:51:0:1) failed to generate a register dump with error = 17 : Link Resetting.
 
(2025年...)
Thu Sep 25 20:00:55 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2a(pci0:51:0:0) failed to generate a register dump with error = 17 : Link Resetting.
Thu Sep 25 20:08:50 +0900 [node-01: CCMA-Worker: netif.linkInfo:info]: Ethernet adapter e2a(pci0:51:0:0) failed to generate a register dump with error = 17 : Link Resetting.
Thu Sep 25 20:11:05 +0900 [node-01: CCMA-Worker: netif.linkInfo:info]: Ethernet adapter e2a(pci0:51:0:0) has generated a register dump in /mroot/etc/mlx5log : Link Resetting.
Thu Sep 25 20:15:27 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2b(pci0:51:0:1) failed to generate a register dump with error = 17 : Link Resetting.
Thu Sep 25 20:17:42 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2b(pci0:51:0:1) has generated a register dump in /mroot/etc/mlx5log : Link Resetting.
  • ONTAP 9.12.1P7から9.15.1P14へのNDUアップグレード中に、この不安定なCX6 NICを搭載したノードnode-01でパニックが発生しました
cluster::*> storage failover takeover -ofnode node-01
cluster::*> Files /cfcard/x86_64/freebsd/image1/VERSION and /var/VERSION differ
ERROR: /var cannot be downgraded.
Waiting for PIDS:  1392.
Terminated
.
Setting default boot image to image1...
done.
Uptime: 722d2h54m27s
PANIC  : peg_nvmeof_qpair_flush_request: Failed to move RDMA qp (0xfffff804eac60c00) to error state: -60
 
version: 9.12.1P7: Fri Sep 15 02:00:51 EDT 2023
conf  : x86_64.optimize
cpuid = 3
KDB: stack backtrace:
vpanic() at vpanic+0x429/frame 0xfffffe121d094210
panic() at panic+0x42/frame 0xfffffe121d094270
peg_nvmeof_qpair_flush_request() at peg_nvmeof_qpair_flush_request+0x74a/frame 0xfffffe121d094360
peg_nvmeof_ctrlr_fail_task() at peg_nvmeof_ctrlr_fail_task+0xa8/frame 0xfffffe121d094390
stack_zero() at stack_zero+0x137/frame 0xfffffe121d0943f0
taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame 0xfffffe121d094430
fork_exit() at fork_exit+0xb2/frame 0xfffffe121d094470
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe121d094470
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 722d2h56m51s
 
PANIC: peg_nvmeof_qpair_flush_request: Failed to move RDMA qp (0xfffff804eac60c00) to error state: -60 in process peg nvmeof taskq_31 on release 9.12.1P7 (C) on Thu Sep 25 20:19:51 KST 2025
version: 9.12.1P7: Fri Sep 15 02:00:51 EDT 2023
 
  • パニック・リブート後、ノード node-01 の CX6 NIC は sysconfig -a 出力で認識されなくなりました
NDU前:
slot 1: Dual 40G/100G/200G Ethernet Controller CX6
slot 2: Dual 40G/100G/200G Ethernet Controller CX6
e2a MAC Address:   xx:xx:xx:xx:xx:90 (auto-100g_cr4-fd-up)
e2b MAC Address:   xx:xx:xx:xx:xx:91 (auto-100g_cr4-fd-up)
slot 3: Quad 10G/25G Ethernet Controller CX5
 
 
NDU後:
slot 1: Dual 40G/100G/200G Ethernet Controller CX6
slot 3: Quad 10G/25G Ethernet Controller CX5

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.