メインコンテンツまでスキップ

クラスタスイッチのアップグレード後にクラスタLIFからpingを実行してもパケット損失が続く

Views:
23
Visibility:
Public
Votes:
0
Category:
fabric-interconnect-and-management-switches<a>2009804577</a>
Specialty:
hw
Last Updated:

環境

  • Cisco NX3232Cクラスタネットワークスイッチ(CNS)
  • NX-OSファームウェアノコウシン 
  • RCFファームウェアを1.8以前から1.10以降に更新

問題

  • すべてのノードが継続的にレポートを作成:
[vifmgr: vifmgr.cluscheck.ctdpktloss:debug]: Continued packet loss when pinging from cluster lif node-01_clus-1 (node node-01) to cluster lif node-02_clus2 (node node-02).
 
相互のクラスタLIFに対してpingを実行したとき。
  • ハーフクラスタpingを使用した場合-クラスタに障害が発生します。

::*> cluster ping-cluster -node node-01
...
 Basic connectivity succeeds on 14 path(s)
 Basic connectivity fails on 14 path(s)
 ...
 Larger than PMTU communication succeeds on 14 path(s)
 RPC status:
 14 paths up, 0 paths down (tcp check)
 14 paths up, 0 paths down (udp check)

  • スイッチ1に接続されたクラスタポートがスイッチ2のLIFにリバートされるたびに、次の手順を実行します。
    •  EMSで次のようなメッセージが報告されます。
vifmgr: vifmgr.dbase.checkerror:alert]: VIFMgr experienced an error verifying cluster database consistency. Some LIFs might not be hosted properly as a result.
vifmgr: vifmgr.startup.failover.err:alert]: VIFMgr encountered errors during startup.
  • vifmgrから次のようなメッセージが報告されます。
[kern_vifmgr:info:6537] rdb::qm:...:src/rdb/quorum/qm_states/inq/SecondaryState.cc:222 (thr_id:0x80c138500) SecondaryState::receivePoll Leaving quorum at 21170636s apparent starvation or RPC failure at sender 1003. Sender expected VS_Unknown, actual WS_QuorumMember.
  • mgwdは次のようなメッセージを報告します。
[kern_mgwd:info:2343] A [src/rdb/quorum/qm_states/inq/SecondaryState.cc 217 (0x823d60300)]: receivePoll: Leaving quorum at 9068946s apparent starvation or RPC failure at sender 1003. Sender expected VS_Unknown, actual WS_QuorumMember.
[kern_mgwd:info:2343] A [src/rdb/cluster_events.cc 88 (0x823d60300)]: Report: Cluster event: node-event, epoch 31, site 1004 [apparent starvation detected in voting protocol].
[kern_mgwd:info:2325] W [src/rdb/TM.cc 3923 (0x821377f00)]: _coord_commit: TM 1003: Transaction TID <31,277502,277502> commit failed: UNIT_OFFLINE; declaring unstable quorum in epoch 31.  Total participating sites: 3, number of sites committed: 3, epsilon commit: true
[kern_mgwd:info:2325] rdb::TM:Mon Nov 06 11:06:47 2023:src/rdb/TM.cc:3933 (thr_id:0x821377f00) TM 1003: Transaction TID <31,277502,277502> commit failed: UNIT_OFFLINE; declaring unstable quorum in epoch 31.  Total participating sites: 3, number of sites committed: 3, epsilon commit: true
  • 問題は残り、ISLがイネーブルになっているかどうか(各スイッチのトラフィックを分離するため)をリガードします。

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.