メインコンテンツまでスキップ

Active IQ Unified Manager アラート:破損ディスクが原因でアグリゲートのRAIDが再構築中です

Views:
25
Visibility:
Public
Votes:
0
Category:
oncommand-unified-manager<a>2009402617</a>
Specialty:
om
Last Updated:

環境

  • Active IQ Unified Manager
  • OnCommand Unified Manager ( UM ) 
  • Data ONTAP 8

問題

  • OnCommand Unified Managerがディスクの再構築ステータスを報告します。

-------------------------------------
Alert from OnCommand Unified Manager: Aggregate
Reconstructing
A risk was generated by XXXXXXXXXX that requires your attention.
Risk      - Aggregate
Reconstructing
Impact Area   - Availability
Severity    - Warning
Source     - node-1:aggr01
Trigger Condition -
Aggregate aggr01's RAID status is reconstructing because of broken disks -
.
-------------------------------------

  • EMSログでは、コマンドのタイムアウトや、そのディスクでの高レイテンシをテストしています。

Thu Dec 15 09:51:06 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 09:51:32 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 09:52:18 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 09:59:14 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 10:29:53 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 10:31:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 60 msecs and average utilization of 37 percent. Highest average IO latency: 0d.14.9: 60 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:31:32 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 77 msecs and average utilization of 46 percent. Highest average IO latency: 0d.14.9: 77 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:32:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 88 msecs and average utilization of 54 percent. Highest average IO latency: 0d.14.9: 88 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:32:32 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 94 msecs and average utilization of 62 percent. Highest average IO latency: 0d.14.9: 94 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:33:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 103 msecs and average utilization of 68 percent. Highest average IO latency: 0d.14.9: 103 msecs; next highest IO latency: 1d.22.5: 9 msecs
Thu Dec 15 10:33:02 JST [node-1: disk_latency_monito: shm.threshold.highIOLatency:error]: Disk 0d.14.9 exceeds the average IO latency threshold and will be recommended for failure.
Thu Dec 15 10:33:03 JST [node-1: config_thread: raid.disk.maint.start:notice]: Disk /aggr_sas_01/plex0/rg2/0d.14.9 Shelf 14 Bay 9 [NETAPP   X422_HCOBE600A10 NA00] S/N [XXXXXXXX] will be tested.
Thu Dec 15 10:33:03 JST [node-1: disk_admin: disk.failmsg:error]: Disk 0d.14.9 (XXXXXXXX): exceeded latency threshold.
Thu Dec 15 10:33:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}
Thu Dec 15 10:34:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}
Thu Dec 15 10:35:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}
Thu Dec 15 10:36:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}

  • テスト後、ディスクはシステムから障害状態を解除され、スペアプールに移動されました。

[?] Thu Dec 15 12:21:54 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 0d.14.9 invalidate debounce - 40', 'adapterName': '0c'}
[?] Thu Dec 15 12:21:54 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '0c'}
[?] Thu Dec 15 12:21:55 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Device 1a.14.9 invalidate debounce - 40', 'adapterName': '1a'}
[?] Thu Dec 15 12:21:55 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '1a'}
[?] Thu Dec 15 12:21:59 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 0d.14.9 came back.', 'adapterName': '0c'}
[?] Thu Dec 15 12:21:59 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '0c'}
[?] Thu Dec 15 12:22:01 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Device 1a.14.9 came back.', 'adapterName': '1a'}
[?] Thu Dec 15 12:22:01 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '1a'}
[?] Thu Dec 15 12:25:22 JST [midst03-02: api_dpool_04: ems.engine.suppressed:debug]: Event 'od.rdb.mbox.debug' suppressed 4 times in last 897 seconds.
[?] Thu Dec 15 12:25:22 JST [midst03-02: api_dpool_04: od.rdb.mbox.debug:debug]: params: {'message': 'RDB-HA readPSlot: Read blob_type 3, (pslot 0), instance 0.'}
[?] Thu Dec 15 12:25:22 JST [midst03-02: mgwd: rdb.ha.verified:notice]: Verified that cluster high availability (HA) is configured correctly, and that on-disk mailboxes are intact.
[?] Thu Dec 15 12:26:09 JST [midst03-02: mgwd: rdb.ha.verified:notice]: Verified that cluster high availability (HA) is configured correctly, and that on-disk mailboxes are intact.
[?] Thu Dec 15 12:27:01 JST [midst03-02: disk_admin: disk.partner.diskUnfail:info]: The partner has unfailed 0d.14.9.

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.