複数のディスク障害が原因で発生したテイクオーバー
環境
- Data ONTAP (7-Mode)8.2.5P5
- FAS6250
- 2ノードファブリック接続MetroCluster
問題
このメッセージには、ディスク関連の複数のエラーが示されています。
- 複数のディスクの「書き込み処理中のチェックサムエントリが無効です」
- 複数のディスクの「整合性のあるラベルセット(CLS)になっていないため、孤立したディスクがあります
- 複数のディスクの「算出されたプレックス整合性ラベルセット」よりも新しいため、孤立ディスク
- SyncMirror プレックスでAutoSupportがトリガーされませんでした
- 「iskown.ownerReservationMismatch」エラーが発生しました
次に例を示します。
Sat May 15 04:50:41 UTC [Node01:raid.tetris.cksum.embed:CRITICAL]: Invalid checksum entry on Disk /aggr_Node01_data/plex1/rg1/Site01-sw1:2.126L36 Shelf 31 Bay 9 [NETAPP X422_SLTNG600A10 NA02] S/N [SerialNumber], block #60799576, during write operation.
Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.notInCls:error]: Orphaning disk Site02-sw1:2.126L14 in plex aggr_Node01_data/1, because not in consistent label set (CLS).
Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.moreRecent:error]: Orphaning disk Site01-sw2:2.126L14 in plex aggr_Node01_data/0, because it is more recent (146175/1789746823, 146175/1789746823) than the calculated plex consistent label set (146174/1789745659).
Sat May 15 04:51:16 UTC [Node01:raid.assim.rg.missingChild:error]: Aggregate aggr_Node01_data, rgobj_verify: RAID object 0 has only 18 valid children, expected 22.
Sat May 15 04:51:16 UTC [Node01:raid.assim.plex.missingChild:error]: Aggregate aggr_Node01_data, plexobj_verify: Plex 1 only has 1 working RAID groups (2 total) and is being taken offline
Sat May 15 04:51:16 UTC [Node01:callhome.syncm.plex:CRITICAL]: Call home for SYNCMIRROR PLEX FAILED
Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.
Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk /aggr_Node01_data/plex1/rg0/Site01-sw1:2.126L54 Shelf 32 Bay 1 [NETAPP X422_SLTNG600A10 NA02] S/N [SerialNumber] has failed. The system will correct the problem.
Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk Site01-sw1:2.126L14 Shelf 30 Bay 13 [NETAPP X422_SCOMP600A10 NA03] S/N [SerialNumber] has failed. The system will correct the problem.
Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.
Sat May 15 04:51:39 UTC [Node01:diskown.ownerReservationMismatch:warning]: disk Site01-sw2:2.126L12 (S/N SerialNumber) is supposed to be owned by this node but has a persistent reservation placed by node ?? (ID 28600)
このエラーが最初に発生した直後にノードがパートナーにテイクオーバーされるのは、ノードがデグレード状態になったためです。
例:
A disk reservation was detected on disk Site01-sw1:2.126L8 at DDMMMYYYY 04:53:51
Ordinarily, this will only occur if the partner node has taken over.
This node will be shutdown.
HALT: HA partner has taken over disk reservations
Uptime: ddhhmmss
System rebooting...