メインコンテンツへスキップ

クラスタネットワークの劣化: FAS2750 の単一クラスタポートリンクのフラッピング

Views:
5
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

環境

  • ONTAP 9
  • FAS2750
  • スイッチレス クラスタ

問題

  • CLUSTER NETWORK DEGRADED 単一クラスター ポート リンクのフラッピングによりエラーが検出されました。

Tue May 13 02:04:26 +0000 [Node1B: kernel: netif.linkDown:info]: Ethernet e0a: Link down, check cable.
Tue May 13 02:04:26 +0000 [Node1B: vifmgr: vifmgr.portdown:notice]: A link down event was received on node Node1B, port e0a.
Tue May 13 02:04:26 +0000 [Node1B: vifmgr: vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node1B has gone down unexpectedly.
Tue May 13 02:04:26 +0000 [Node1B: vifmgr: vifmgr.lifmoved.linkdown:notice]: LIF nrtxsz04-02_clus1 (on virtual server 4294967293), IP address 169.254.208.163, is being moved to node Node1B, port e0b.
Tue May 13 02:04:28 +0000 [Node1B: kernel: netif.linkUp:info]: Ethernet e0a: Link up.
Tue May 13 02:04:28 +0000 [Node1B: vifmgr: vifmgr.portup:notice]: A link up event was received on node Node1B, port e0a.
Tue May 13 02:06:53 +0000 [Node1B: kernel: netif.linkDown:info]: Ethernet e0a: Link down, check cable.
Tue May 13 02:06:53 +0000 [Node1B: vifmgr: vifmgr.portdown:notice]: A link down event was received on node Node1B, port e0a.
Tue May 13 02:06:53 +0000 [Node1B: vifmgr: vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node1B has gone down unexpectedly.
Tue May 13 02:06:53 +0000 [Node1B: vifmgr: vifmgr.lifmoved.linkdown:notice]: LIF nrtxsz04-02_clus1 (on virtual server 4294967293), IP address 169.254.208.163, is being moved to node Node1B, port e0b.
Tue May 13 02:06:56 +0000 [Node1B: vifmgr: vifmgr.lifsuccessfullymoved:notice]: LIF nrtxsz04-02_clus1 (on virtual server 4294967293), IP address 169.254.208.163, is now hosted on node Node1B, port e0b.
Tue May 13 02:06:56 +0000 [Node1B: kernel: netif.linkUp:info]: Ethernet e0a: Link up.
Tue May 13 02:06:56 +0000 [Node1B: vifmgr: vifmgr.portup:notice]: A link up event was received on node Node1B, port e0a.
Tue May 13 02:06:56 +0000 [Node1B: vifmgr: vifmgr.port.monitor.failed:error]: The "link_flapping" health check for port e0a (node Node1B) has failed. The port is operating in a degraded state.
Tue May 13 02:06:56 +0000 [Node1B: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Frequent Link Flapping - Cluster port e0a on node Node1B has experienced multiple link down notifications.

  • CRC エラーが見つかりました:

-- interface  e0a  (17 days, 23 hours, 36 minutes, 28 seconds) --

RECEIVE
 Total frames:    1450m | Frames/second:     0  | Total bytes:     412g
 Bytes/second:     0  | Total errors:    260k | Errors/minute:     0
 Total discards:    0  | Discards/minute:    0  | Multi/broadcast:   130k
 Non-primary u/c:    0  | CRC errors:     259k | Runt frames:      0
 Fragment:       0  | Long frames:      0  | Jabber:       420
 Length errors:    914  | No buffer:       0  | Xon:          0
 Xoff:         0  | Jumbo:       5550k | Noproto:        0
 Error symbol:   26258  | Illegal symbol:  22600  | Bus overruns:     0
 Queue drops:      0  | LRO segments:    1446m | LRO bytes:      405g
 LRO6 segments:     0  | LRO6 bytes:      0  | Bad UDP cksum:     0

  • VIGMGR ログから: ポート e0a がダウンし、RDB ユニットがオフラインになりました。
    - ノードが OOQ (Out of Quorum) を報告し、VIFMGR がオフラインになりました。

    Tue May 13 2025 00:40:38 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [EventMgr::executeLegacyEvent] Periodic Cluster Network Verification: ping-cluster and MTU verifications successful
    Tue May 13 2025 01:42:35 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [EventMgr::executeLegacyEvent] Periodic Cluster Network Verification: ping-cluster and MTU verifications successful
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [FailoverMgr::setPortHealthUnknown] Port 9 health status is currently Unknown
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [FailoverMgr::linkDown] Cluster Link Down: Port 9 has gone down unexpectedly
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [FailoverMgr::linkDown] cluster port (e0a) is now link down, dispatching switchless update
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [Net::RdbLifHandle::avoidDownPorts] LIF lif:rdb:4294967293:nrtxsz04-02_clus1 (1011) is assigned to a down port (nrtxsz04p01b:e0a). Attempting to reassign.
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] W [src/rdb/TM.cc 5329 (0x80c3ff100)]: handleCoordBeginTranStatus: beginTran status UNIT_OFFLINE, txn request epoch 64
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] [May 13 02:04:26]: 0x80c3ff100: 0: ERR: rdb_tran_glue: create: tid=0x80c3ff100 failed to create transaction for label='Net::RdbLifHandle::commitConfig': UNIT_OFFLINE
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [Net::TransactionHelper::create] Failed to create transaction Net::RdbLifHandle::commitConfig: Node "nrtxsz04p01b" on ring "VifMgr" is offline. Check the health of the cluster using the "cluster show" command. For further assistance, contact technical support.
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [LinkFlappingHealthMonitor::linkDown] linkdown event received on port 9 at time 1461140s, ignore flap status: false
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] W [src/rdb/TM.cc 5329 (0x80c3ff100)]: handleCoordBeginTranStatus: beginTran status UNIT_OFFLINE, txn request epoch 64
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] [May 13 02:04:26]: 0x80c3ff100: 0: ERR: rdb_tran_glue: create: tid=0x80c3ff100 failed to create transaction for label='Net::RdbLifHandle::commitConfig': UNIT_OFFLINE
    Tue May 13 2025 02:04:26 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [Net::TransactionHelper::create] Failed to create transaction Net::RdbLifHandle::commitConfig: Node "nrtxsz04p01b" on ring "VifMgr" is offline. Check the health of the cluster using the "cluster show" command. For further assistance, contact technical support.
    Tue May 13 2025 02:04:28 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [FailoverMgr::setPortHealthUnknown] Port 9 health status is currently Unknown
    Tue May 13 2025 02:04:28 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [FailoverMgr::linkUp] cluster port (e0a) is now link up, dispatching switchless update
    Tue May 13 2025 02:04:28 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [FailoverMgr::linkUp] clearing arp cache for cluster ports
    Tue May 13 2025 02:04:28 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [LinkFlappingHealthMonitor::linkUp] linkup event received on port 9 at time 1461142s, ignore flap status: false

    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] ******* OOQ QM mtrace dump END *********
    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] A [src/rdb/TM.cc 1883 (0x80ec32f00)]: _changeRole: TM 1001: change role at epoch 0 to 0x40 recovery transaction number 0
    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] A [src/rdb/TM.cc 1620 (0x80ec32f00)]: _triggerOnlineStatusCallback: TM 1001: Report UNIT_IS_OFFLINE  (epoch 0, master 0).
    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] A [src/rdb/TM.cc 1624 (0x80ec32f00)]: _triggerOnlineStatusCallback: FAILOVER rdb: Local unit VifMgr offline
    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] A [src/rdb/HAM.cc 1183 (0x80ec32f00)]: reportLocalOffline: HAM: new goal HAM_GOAL_ACTIVATE
    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] A [src/rdb/cluster_events.cc 88 (0x80ec32f00)]: Report: Cluster event: node-event, epoch 0, site 1001 [local node offline].
    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] A [src/rdb/HAM.cc 1624 (0x80ec32800)]: _hamThreadFunc: HAM: daemon goal change from HAM_GOAL_NONE to HAM_GOAL_ACTIVATE or shutdown 0
    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] Notice: online_status_callback: RDB unit is offline
    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] [May 13 02:04:29]: 0x80ec33600: 0: INFO: RDB::callback::registrar: callback:src/rdb/rdb_online_registrar.cc:80 rdb_callbacks::ONLINE::BEGIN:: 1461143820
    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] [May 13 02:04:29]: 0x80ec33600: 0: INFO: RDB::callback::registrar: callback:src/rdb/rdb_online_registrar.cc:105 rdb_callbacks::ONLINE::END:: 1461143820
    Tue May 13 2025 02:04:29 +00:00 [kern_vifmgr:info:7661] [0x80c3fce00] [EventMgr::onlineCallback] received UNIT_IS_OFFLINE from RDB
    Tue May 13 2025 02:04:30 +00:00 [kern_vifmgr:info:7661] [0x80c3fd500] [FailoverMgr::cluster_check] Starting cluster ping test...
    Tue May 13 2025 02:04:31 +00:00 [kern_vifmgr:info:7661] [May 13 02:04:31]: 0x80c3ff100: 0: ERR: rdb_tran_glue: create: tid=0x80c3ff100 failed to create transaction for label='Net::RdbLifHandle::commitConfig': UNIT_OFFLINE
    Tue May 13 2025 02:04:31 +00:00 [kern_vifmgr:info:7661] [0x80c3ff100] [Net::TransactionHelper::create] Failed to create transaction Net::RdbLifHandle::commitConfig: Node "nrtxsz04p01b" on ring "VifMgr" is offline. Check the health of the cluster using the "cluster show" command. For further assistance, contact technical support.
    Tue May 13 2025 02:04:32 +00:00 [kern_vifmgr:info:7661] [0x80c3fd500] [FailoverMgr::cluster_check] large pkt : 0% packet loss when pinging from nrtxsz04-02_clus1 ( 169.254.208.163 ) on nrtxsz04p01b -> nrtxsz04-02_clus2 ( 169.254.91.200 ) on nrtxsz04p01b
    Tue May 13 2025 02:04:32 +00:00 [kern_vifmgr:info:7661] A [src/rdb/quorum/qm_states/ooq/FailedState.cc 51 (0x80ec32100)]: state2: WS_Failed -> WS_WaitingForVotes

  • ケーブルを差し直しても問題は解決しません。

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.