vifmgr.cluscheck.hwerrors:100GポートでのRCFの設定が間違っているため
環境
- クラスタネットワークスイッチBES-53248
- RCF
- 100Gbクラスタネットワークポート:例:
slot 0: 40G/100G Ethernet Controller CX5
e0c MAC Address: d0:39:ea:34:xx:yy (auto-100g_cr4-fd-up)
QSFP Vendor: Molex
QSFP Part Number: 112-00576
問題
すべてのノードおよびポートで複数のネットワークハードウェアエラーが発生している。例:
vifmgr.cluscheck.hwerrors: Port e0c on node node_name-1 is reporting a high number (at least 1 per 1000 packets) of observed hardware errors (CRC, length, alignment, dropped).
callhome.clus.net.degraded:alert: Call home for CLUSTER NETWORK DEGRADED: CRC Errors Detected - High CRC errors detected on port e4a node node_name
hm.alert.raised: Alert Id = NodeIfInErrorsWarnAlert , Alerting Resource = node_name-1/e0c raised by monitor controller
vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-2_clus2 (node node_name-2) to cluster lif node_name-1_clus1 (node node_name-1).
vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-2_clus2 (node node_name-2) to cluster lif node_name-1_clus2 (node node_name-1).
vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-1_clus1 (node node_name-1) to cluster lif node_name-2_clus2 (node node_name-2).
vifmgr.cluscheck.ctdpktloss: Continued packet loss when pinging from cluster lif node_name-2_clus1 (node node_name-2) to cluster lif node_name-1_clus1 (node node_name-1).
または
Tue Jul 11 15:58:49 +0800 [nodeb: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Large MTU Packet Loss - Ping failures detected between nodeb-clus1 ( 169.254.2.202 ) on nodeb and node-2b_clus1 ( 169.254.148.103 ) on node-2b