9.8 へのアップグレード後、デュアルキャリアディスクが無限 SDC ループでスタックします

最後の更新
PDFとして保存

Views:: 21

Visibility:: Public

Votes:: 0

Category:: fas-systems

Specialty:: HW

Last Updated:

環境

DS4486 ストレージシェルフ
ONTAP 9.8

問題

9.8 へのアップグレード後、 DS4486 シェルフの複数のディスクで、特定の原因を使用せずに shm_setup_on_failure と報告される。

Wed Feb 10 09:40:25 -0800 [nodeb: api_dpool_17: scsi.debug:debug]: shm_setup_for_failure disk 3a.20.7L1 (S/N ZC1xxxxx) error 40000000h
 Wed Feb 10 09:40:25 -0800 [nodeb: api_dpool_18: scsi.debug:debug]: shm_setup_for_failure disk 3a.20.17L1 (S/N K7Hxxxxx) error 40000000h
 Wed Feb 10 09:40:26 -0800 [nodeb: api_dpool_20: scsi.debug:debug]: shm_setup_for_failure disk 3a.21.20L2 (S/N ZC1xxxxx) error 40000000h

次に、キャリア内の 1 つのディスクが退避されるか障害状態になると、同じキャリア内のもう一方のディスクは病気のディスクコピーループを通過しますが、このループは 0% を超えて処理されず、最終的に自身がキャンセルされます。

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
 --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
 dparity   0d.23.2L2       0d    23  2   SA:A   0 MSATA  7200 3748319/7676558720 3815447/7814037168
 parity    3b.13.22L1      3b    13  22  SA:A   0 MSATA  7200 3748319/7676558720 3815447/7814037168
 data      3a.20.19L1      3a    20  19  SA:B   0 MSATA  7200 3748319/7676558720 3815447/7814037168
 data     3a.21.20L1      3a    21  20  SA:B   0 MSATA  7200 3748319/7676558720 3815447/7814037168 (evacuating, copy in progress)
 -> copy   3a.21.16L2      3a    21  16  SA:B   0 MSATA  7200 3748319/7676558720 3815447/7814037168 (copy 0% completed)

EMS ログには次のメッセージが表示されます。

 raid_lm: raidlm.carrier.evac.start config_thread: raid.rg.diskcopy.start config_thread: raid.rg.diskcopy.progress raid_lm: raidlm.carrier.evac.abort config_thread: raid.rg.diskcopy.aborted

例：

[raid.rg.diskcopy.start:notice]: /nodea_aggr_vol0/plex0/rg0: starting disk copy from 3a.21.20L1 (S/N [PCJHxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started.. raid.rg.diskcopy.progress:debug]: Disk copy progress from (S/N PCJHxxxx) to (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS). [raid_rg_diskcopy_aborted_1:notice]: params: {'target': '3a.21.16L2', 'duration': '3:13.16', 'source': '3a.21.20L1', 'reason': 'Source disk failed.', 'rg': '/nodea_aggr_vol0/plex0/rg0', 'owner': '', 'aggregate_uuid': 'f0f3c156-b7f6-4344-adce-249752a6fcf4', 'blockNum': '2156224'} [raid.rg.diskcopy.start:notice]: /nodea_aggr_02/plex0/rg1: starting disk copy from 3a.21.20L1 (S/N [K4G5xxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started.. [raid.rg.diskcopy.progress:debug]: Disk copy progress from 2.21.20.1 (S/N K4G5xxxx) to 2.21.16.2 (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS).

[raid_rg_diskcopy_aborted_1:notice]: params: {'target': '3a.21.16L2', 'duration': '2:53.00', 'source': '3a.21.20L1', 'reason': 'Source disk failed.', 'rg': '/nodea_aggr_02/plex0/rg1', 'owner': '', 'aggregate_uuid': 'cd3fa773-6ba5-48f6-9872-8c4a7ed5ff6f', 'blockNum': '793024'} [raid.rg.diskcopy.start:notice]: /nodea_aggr_vol0/plex0/rg0: starting disk copy from 3a.21.20L1 (S/N [PCJHxxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started.. [raid.rg.diskcopy.progress:debug]: Disk copy progress from 2.21.20.1 (S/N PCJHxxxx) to 2.21.16.2 (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS).