複数のディスク障害が原因で発生したテイクオーバー

最後の更新
PDFとして保存

Views:: 34

Visibility:: Public

Votes:: 0

Category:: metrocluster

Specialty:: 7dot

Last Updated:

環境

Data ONTAP （7-Mode）8.2.5P5
FAS6250
2ノードファブリック接続MetroCluster

問題

このメッセージには、ディスク関連の複数のエラーが示されています。

複数のディスクの「書き込み処理中のチェックサムエントリが無効です」
複数のディスクの「整合性のあるラベルセット（CLS）になっていないため、孤立したディスクがあります
複数のディスクの「算出されたプレックス整合性ラベルセット」よりも新しいため、孤立ディスク
SyncMirror プレックスでAutoSupportがトリガーされませんでした
「iskown.ownerReservationMismatch」エラーが発生しました

次に例を示します。

Sat May 15 04:50:41 UTC [Node01:raid.tetris.cksum.embed:CRITICAL]: Invalid checksum entry on Disk /aggr_Node01_data/plex1/rg1/Site01-sw1:2.126L36 Shelf 31 Bay 9 [NETAPP   X422_SLTNG600A10 NA02] S/N [SerialNumber], block #60799576, during write operation.  
 Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.notInCls:error]: Orphaning disk Site02-sw1:2.126L14 in plex aggr_Node01_data/1, because not in consistent label set (CLS). 
 Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.moreRecent:error]: Orphaning disk Site01-sw2:2.126L14 in plex aggr_Node01_data/0, because it is more recent (146175/1789746823, 146175/1789746823) than the calculated plex consistent label set (146174/1789745659).
 Sat May 15 04:51:16 UTC [Node01:raid.assim.rg.missingChild:error]: Aggregate aggr_Node01_data, rgobj_verify: RAID object 0 has only 18 valid children, expected 22.  
 Sat May 15 04:51:16 UTC [Node01:raid.assim.plex.missingChild:error]: Aggregate aggr_Node01_data, plexobj_verify: Plex 1 only has 1 working RAID groups (2 total) and is being taken offline  
 Sat May 15 04:51:16 UTC [Node01:callhome.syncm.plex:CRITICAL]: Call home for SYNCMIRROR PLEX FAILED 
 Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.  
 Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk /aggr_Node01_data/plex1/rg0/Site01-sw1:2.126L54 Shelf 32 Bay 1 [NETAPP   X422_SLTNG600A10 NA02] S/N [SerialNumber] has failed. The system will correct the problem.  
 Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk Site01-sw1:2.126L14 Shelf 30 Bay 13 [NETAPP   X422_SCOMP600A10 NA03] S/N [SerialNumber] has failed. The system will correct the problem.  
 Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.  
 Sat May 15 04:51:39 UTC [Node01:diskown.ownerReservationMismatch:warning]: disk Site01-sw2:2.126L12 (S/N SerialNumber) is supposed to be owned by this node but has a persistent reservation placed by node ?? (ID 28600)

このエラーが最初に発生した直後にノードがパートナーにテイクオーバーされるのは、ノードがデグレード状態になったためです。

例：

 A disk reservation was detected on disk Site01-sw1:2.126L8 at DDMMMYYYY 04:53:51
 Ordinarily, this will only occur if the partner node has taken over.
 This node will be shutdown.
 HALT: HA partner has taken over disk reservations
 Uptime: ddhhmmss
 System rebooting...