A300 / FAS8200、A200 / FAS2600、A220 / FAS2700、C190のe0a / e0bのリンクフラップ5月原因テイクオーバー

最後の更新
PDFとして保存

Views:: 1,243

Visibility:: Public

Votes:: 1

Category:: fas-systems

Specialty:: hw

Last Updated:

環境

AFF A300、FAS8200
AFF A200、FAS2650、FAS2620
AFF A220、AFF C190、FAS2750、FAS2720
ONTAP 9

問題

クラスタポートe0aまたはe0b（または両方のポート）でリンクフラップが発生するか、同時に停止します。

Tue Oct 03 11:08:31 CEST [node1: ixgbe/e0b: snmp.link.down:info]: Interface 2 is down. Tue Oct 03 11:08:31 CEST [node1: ixgbe/e0b: netif.linkDown:info]: Ethernet e0b: Link down, check cable. Tue Oct 03 11:08:31 CEST [node1: ixgbe/e0a: snmp.link.down:info]: Interface 1 is down. Tue Oct 03 11:08:31 CEST [node1: ixgbe/e0a: netif.linkDown:info]: Ethernet e0a: Link down, check cable.

Tue Oct 03 11:08:32 CEST [node2: ixgbe/e0b: snmp.link.down:info]: Interface 2 is down. Tue Oct 03 11:08:32 CEST [node2: ixgbe/e0b: netif.linkDown:info]: Ethernet e0b: Link down, check cable. Tue Oct 03 11:08:32 CEST [node2: ixgbe/e0a: snmp.link.down:info]: Interface 1 is down. Tue Oct 03 11:08:32 CEST [node2: ixgbe/e0a: netif.linkDown:info]: Ethernet e0a: Link down, check cable.

クラスタポートのステータスとストレージフェイルオーバーのステータスを確認します。

cluster1::>network port show -ipspace Cluster

Node: cluster1-01 Speed(Mbps) Health Port IPspace Broadcast Domain Link MTU Admin/Oper Status --------- ------------ ---------------- ---- ---- ----------- -------- e0a Cluster Cluster down 9000 1000/- - e0b Cluster Cluster down 9000 1000/- -

Node: cluster1-02 Speed(Mbps) Health Port IPspace Broadcast Domain Link MTU Admin/Oper Status --------- ------------ ---------------- ---- ---- ----------- -------- e0a Cluster Cluster down 9000 1000/- - e0b Cluster Cluster down 9000 1000/- - 4 entries were displayed.

cluster1::>storage failover show
Takeover Node Partner Possible State Description ------------- -------------- -------- ------------------------------------- cluster1-01 cluster1-02 false Connected to cluster-02, Partial giveback, Takeover is not possible: The version of software running on each node of the SFO pair is incompatible, NVRAM log not synchronized cluster1-02 cluster1-01 - Waiting for cluster applications to come online on the local node Offline applications: mgmt, vldb, vifmgr, bcomd, crs.

ポートが復旧せず、Connectivity、Liveliness and Availability Monitor（CLAM）が有効になっている場合

いずれかのノードで「クォーラム不足」パニックが発生します。

PANIC : Received PANIC packet from partner, receiving message is (Coredump and takeover initiated because Connectivity, Liveliness and Availability Monitor (CLAM) has determined this node is out of quorum.

パニック状態になったノードがテイクオーバーされ、稼働しているノードがすべてのデータを提供します。

ポートが復旧せず、Connectivity、Liveliness and Availability Monitor（CLAM）が有効になっていない場合

ストレージのテイクオーバーは実行されず、両方のノードのクォーラムが失われます。どちらのノードもデータを提供しません。
参照：SU436：[Impact：Critical] CLAM TAKEOVERのデフォルト設定が変更されました
同様のメッセージがEMSログにも表示されます。

Jun 08 12:30:09 [xxx-02:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0b on node naptp06c-02 has gone down unexpectedly. Jun 08 12:30:10 [xxxc-02:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node naptp06c-02 has gone down unexpectedly. Jun 08 12:31:00 [xxx-02:monitor.globalStatus.critical:EMERGENCY]: Controller failover of xxx-01 is not possible: partner mailbox disks not accessible or invalid. One or more mirrored aggregates are degraded. Jun 08 12:31:02 [xxx:callhome.clam.node.ooq:EMERGENCY]: Call home for NODE(S) OUT OF CLUSTER QUORUM.