メインコンテンツまでスキップ

複数のディスク障害が原因でアグリゲートがオフラインになっています

Views:
44
Visibility:
Public
Votes:
0
Category:
disk-drives
Specialty:
hw
Last Updated:

環境

  • ONTAP 8.
  • ONTAP 9
  • FAS / AFF システム

問題

  • 複数のディスク障害が原因でアグリゲート/プレックスがオフラインになる:

Cluster::> system node run -node <node-name> sysconfig -r

Aggregate aggr1 (failed, raid_dp, partial) (block checksums)
  Plex /aggr1/plex0 (offline, failed, inactive)
  RAID group /aggr1/plex0/rg1 (partial, block checksums)

      RAID Disk    Device      HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      ---------    ------      ------------- ---- ---- ---- ----- --------------    --------------
      dparity     0a.01.0     0a    1   0   SA:A   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      parity      FAILED          N/A                        3807816/ -
      data        0b.01.2     0b    1   2   SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        0b.01.3     0b    1   3   SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        0b.01.4     0b    1   4   SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        0b.01.5     0b    1   5   SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        0b.01.6     0b    1   6   SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        0b.01.7     0b    1   7   SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        0b.01.8     0b    1   8   SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        0b.01.9     0b    1   9   SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        FAILED          N/A                        3807816/ -
      data        FAILED          N/A                        3807816/ -
      data        FAILED          N/A                        3807816/ -
      data        0b.01.13    0b    1   13  SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        0b.01.14    0b    1   14  SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        0b.01.15    0b    1   15  SA:B   0  FSAS  7200 3807816/7798408704 3815447/7814037168 
      data        FAILED          N/A                        3807816/ -
      Raid group is missing 5 disks.

  • ディスク障害アラートは、次のようなイベントログに表示されます。

[Node-01:scsi.cmd.checkCondition:debug]: Disk device 0b.01.10: Check Condition: CDB 0x1b: Sense Data SCSI:not ready -  (0x2 - 0x4 0x0 0x0)(0).  [Node-01:disk.init.failure.spinup:error]: Disk 0b.01.10 has failed to spin up and cannot be used. Please replace it with a new drive. 
[Node-01:callhome.dsk.no.spin:ALERT]: Call home for DISK NOT SPINNING 
[Node-01:disk.init.failure.error:warning]: Disk 0b.01.10 failed initialization due to error 5.
[Node-01:disk.readReservationFailed:error]: Disk read reservation failed on 0b.01.10 CDB 0x5e:01 - SCSI:not ready (2 4 0)
[Node-01:diskown.errorDuringIO:error]: error 19 (disk not ready for requested operation) on disk 0b.01.10 (S/N ) while reading reservation state 
[Node-01:disk.ioFailed:error]: I/O operation failed despite several retries.  

[Node-01:raid.config.disk.failed:error]: Disk 0b.01.16 Shelf 1 Bay 16 [NETAPP   X477_SMEGX04TA07 NA02] S/N [XXXXXXXX] failed. 
[Node-01:callhome.dsk.fault:error]: Call home for DISK FAILED 
[Node-01:raid.fdr.reminder:warning]: Failed Disk 0b.01.16 Shelf 1 Bay 16 [NETAPP   X477_SMEGX04TA07 NA02] S/N [XXXXXXXX] is still present in the system and should be removed. 
[Node-01:diskown.errorReadingOwnership:warning]: error 3 (disk failed) while reading ownership on disk 0b.01.16 (S/N XXXXXXX) 

[Node-02:disk.init.failureBytes:warning]: Failed disk 0b.01.17 detected during disk initialization

  • イベントログには、アグリゲートについて次のプレックス障害イベントが報告されます。

[Node-01:raid.assim.disk.brokenPreAssim:error]: Broken Disk 0b.01.1 Shelf 1 Bay 1 [NETAPP   X477_SMEGX04TA07 NA02] S/N [XXXXXXXX] detected prior to assimilation. 
[Node-01:raid.assim.rg.missingChild:error]: Aggregate aggr1, rgobj_verify: RAID object 1 has only 13 valid children, expected 16. 
[Node-01:raid.assim.plex.missingChild:error]: Aggregate aggr1, plexobj_verify: Plex 0 only has 1 working RAID groups (2 total) and is being taken offline 
[Node-01:raid.assim.mirror.noChild:ALERT]: Aggregate aggr1, mirrorobj_verify: No operable plexes found. 

[Node-01:raid.rg.recons.missing:notice]: RAID group /agg2/plex0/rg0 is missing 1 disk(s). 
[Node-01:raid.rg.recons.cantStart:warning]: The reconstruction cannot start in RAID group /agg2/plex0/rg0: No matching disks available in spare pool, targeting any spare pool

 

 

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.