スイッチレスクラスタのバグ 1253791 が発生し、クラスタアプリケーションのクォーラムの問題が発生します

最後の更新
PDFとして保存

Views:: 58

Visibility:: Public

Votes:: 2

Category:: ontap-9

Specialty:: core

Last Updated:

環境

FAS2720
ONTAP 9
2 ノードスイッチレスクラスタ

問題

バグ 1253791 が原因でクォーラムが失われたことが原因で 1 つのノードがパニック状態になった（ e0a / e0b クラスタポートがリンク停止になる）
クラスタのアプリケーションをオンラインにできず、クラスタポートが停止 storage failover show してレポートされるため、部分的なギブバックが発生します。

Waiting for cluster applications to come online on the local node

この状態になると、両方のノードの電源が再投入されます
テイクオーバーされたかクラスタマスターだった以前のノードが、ブート後にクラスタアプリケーションがオフラインになり、次のエラーが表示されます。

Internal error: Cannot open corrupt replicated database. Automatic recovery

attempt has failed or is disabled. Check the event logs for details. This node

is not fully operational. Contact support personnel for the root volume recovery

procedures.

bootarg.rdb_corrupt リカバリ手順で状態をクリアしようとすると、テイクオーバーされたノードは mgwd のマスターになりますが、他のアプリケーションではが報告されます。以前のマスターは mgwd のセカンダリであり、他のアプリケーションはオフラインです
例：ノード cluster1-01 が、バグ 1253791 、ノード 02 がテイクオーバーされたためにクォーラムが失われたことでパニック状態になったノードで、電源喪失 / RDB リカバリの前にマスターであった

cluster ring show RDB リカバリ後のノード 01 ：

Node UnitName Epoch DB Epoch DB Trnxs Master Online ----------- -------- -------- -------- -------- ----------- --------- cluster1-01 mgmt 21 21 107 cluster1-01 master cluster1-01 vldb - - - - - cluster1-01 vifmgr - - - - - cluster1-01 bcomd - - - - - cluster1-01 crs - - - - - cluster1-02 mgmt 21 21 107 cluster1-01 secondary cluster1-02 vldb 0 18 3295 - offline cluster1-02 vifmgr 0 20 50 - offline cluster1-02 bcomd 0 19 6 - offline cluster1-02 crs 0 18 1 - offline

cluster ring show RDB リカバリ後のノード 02 ：

Node UnitName Epoch DB Epoch DB Trnxs Master Online ----------- -------- -------- -------- -------- ----------- --------- cluster1-01 crs - - - - - cluster1-02 mgmt 21 21 109 cluster1-01 secondary cluster1-02 vldb 0 18 3295 - offline cluster1-02 vifmgr 0 20 50 - offline cluster1-02 bcomd 0 19 6 - offline cluster1-02 crs 0 18 1 - offline