ハードウェア交換中にルートアグリゲートのWAFLに一貫性がない
環境
- ONTAP 9
- AFF A320
問題
- フェイルオーバーを含むメンテナンス中は、ノードがブートしてギブバックプロンプトが表示されるまで待機します。ここでCtrl+Cキーを押し、プロンプトに「no」と「yes」を入力します。
Waiting for giveback...(Press Ctrl-C to abort wait)
This node was previously declared dead.
Pausing to check HA partner status ...
partner is operational and in takeover mode.
You must initiate a giveback or shutdown on the HA
partner in order to bring this node online.
The HA partner is currently operational and in takeover mode.This node cannot continue unless you initiate a giveback on the partner.
Once this is done this node will reboot automatically.
waiting for giveback...
Do you wish to halt this node rather than wait [y/n]? n
The HA partner appears to be either not operational or not in takeover
mode. You will be asked whether you want to continue. If you answer "yes", the
existing failover monitor disk state will be overwritten and this node will be
rebooted. Answering "no" will halt this node with no modification to the failover
monitor disk state.
WARNING: Answering "yes" while the HA partner is operational and in
takeover mode will have unexpected and potentially catastrophic results:
YOUR FILESYSTEMS MAY BE DESTROYED
Do you wish to continue [y/n]? y
Oct 01 12:07:31 [cluster-02:cf.fm.overwriteState:notice]: System continuing after overwriting failover monitor state!
- テイクオーバーされたノードがリブートし、パニック状態になる可能性があります。
Warning: previous shutdown was dirty, there is a possible loss of data.
Oct 01 12:11:04 [cluster-02:wafl.root.content.changed:error]: Contents of the root volume '' might have changed. Verify that all recent configuration changes are still in effect.
PANIC : NVRAM contents are invalid...
- パニックが発生すると、ノードはリブートしてONTAPログインプロンプトに戻りますが、繰り返し停止します。
SP-login: login: HALT: HA partner has taken over (ic) on Sun Oct 1 12:35:34 CDT 2023
- その後、テイクオーバーされたノードのルートボリュームでWAFLメタデータの不整合が原因で稼働ノードがパニック状態になります。
Sun Oct 01 13:27:50 -0500 [cluster-02: wafl_exempt17: sk.panic:alert]: Panic String: Unrecoverable metadata block (file xxxx, block xxxxxxx, fbn xxxxxxx, level 1, file type 16) in aggregate partner:cluster02_root. WAFL inconsistent. Contact NetApp technical support.
- テイクオーバーされたノード(ブート時に停止していたノード)が、ブート時にパニック状態になりました。
PANIC : Msg execution failed during replay, vol=vol0, msg=0xfffff70067600100, type=WAFL_WRITE, errno=192, replay_idx=1, coalesced=0 coalesced_cnt=63