Active IQ Unified Manager アラート：破損ディスクが原因でアグリゲートのRAIDが再構築中です

最後の更新
PDFとして保存

Views:: 49

Visibility:: Public

Votes:: 0

Category:: oncommand-unified-manager<a>2009402617</a>

Specialty:: OM

Last Updated:

環境

Active IQ Unified Manager
OnCommand Unified Manager （ UM ）
Data ONTAP 8

問題

OnCommand Unified Managerがディスクの再構築ステータスを報告します。

------------------------------------- Alert from OnCommand Unified Manager: Aggregate Reconstructing A risk was generated by XXXXXXXXXX that requires your attention. Risk - Aggregate Reconstructing Impact Area - Availability Severity - Warning Source - node-1:aggr01 Trigger Condition - Aggregate aggr01's RAID status is reconstructing because of broken disks - . -------------------------------------

EMSログでは、コマンドのタイムアウトや、そのディスクでの高レイテンシをテストしています。

Thu Dec 15 09:51:06 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device. Thu Dec 15 09:51:32 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device. Thu Dec 15 09:52:18 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device. Thu Dec 15 09:59:14 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device. Thu Dec 15 10:29:53 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device. Thu Dec 15 10:31:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 60 msecs and average utilization of 37 percent. Highest average IO latency: 0d.14.9: 60 msecs; next highest IO latency: 1d.22.6: 9 msecs Thu Dec 15 10:31:32 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 77 msecs and average utilization of 46 percent. Highest average IO latency: 0d.14.9: 77 msecs; next highest IO latency: 1d.22.6: 9 msecs Thu Dec 15 10:32:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 88 msecs and average utilization of 54 percent. Highest average IO latency: 0d.14.9: 88 msecs; next highest IO latency: 1d.22.6: 9 msecs Thu Dec 15 10:32:32 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 94 msecs and average utilization of 62 percent. Highest average IO latency: 0d.14.9: 94 msecs; next highest IO latency: 1d.22.6: 9 msecs Thu Dec 15 10:33:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 103 msecs and average utilization of 68 percent. Highest average IO latency: 0d.14.9: 103 msecs; next highest IO latency: 1d.22.5: 9 msecs Thu Dec 15 10:33:02 JST [node-1: disk_latency_monito: shm.threshold.highIOLatency:error]: Disk 0d.14.9 exceeds the average IO latency threshold and will be recommended for failure. Thu Dec 15 10:33:03 JST [node-1: config_thread: raid.disk.maint.start:notice]: Disk /aggr_sas_01/plex0/rg2/0d.14.9 Shelf 14 Bay 9 [NETAPP X422_HCOBE600A10 NA00] S/N [XXXXXXXX] will be tested. Thu Dec 15 10:33:03 JST [node-1: disk_admin: disk.failmsg:error]: Disk 0d.14.9 (XXXXXXXX): exceeded latency threshold. Thu Dec 15 10:33:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'} Thu Dec 15 10:34:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'} Thu Dec 15 10:35:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'} Thu Dec 15 10:36:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}

テスト後、ディスクはシステムから障害状態を解除され、スペアプールに移動されました。

[?] Thu Dec 15 12:21:54 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 0d.14.9 invalidate debounce - 40', 'adapterName': '0c'} [?] Thu Dec 15 12:21:54 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '0c'} [?] Thu Dec 15 12:21:55 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Device 1a.14.9 invalidate debounce - 40', 'adapterName': '1a'} [?] Thu Dec 15 12:21:55 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '1a'} [?] Thu Dec 15 12:21:59 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 0d.14.9 came back.', 'adapterName': '0c'} [?] Thu Dec 15 12:21:59 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '0c'} [?] Thu Dec 15 12:22:01 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Device 1a.14.9 came back.', 'adapterName': '1a'} [?] Thu Dec 15 12:22:01 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '1a'} [?] Thu Dec 15 12:25:22 JST [midst03-02: api_dpool_04: ems.engine.suppressed:debug]: Event 'od.rdb.mbox.debug' suppressed 4 times in last 897 seconds. [?] Thu Dec 15 12:25:22 JST [midst03-02: api_dpool_04: od.rdb.mbox.debug:debug]: params: {'message': 'RDB-HA readPSlot: Read blob_type 3, (pslot 0), instance 0.'} [?] Thu Dec 15 12:25:22 JST [midst03-02: mgwd: rdb.ha.verified:notice]: Verified that cluster high availability (HA) is configured correctly, and that on-disk mailboxes are intact. [?] Thu Dec 15 12:26:09 JST [midst03-02: mgwd: rdb.ha.verified:notice]: Verified that cluster high availability (HA) is configured correctly, and that on-disk mailboxes are intact. [?] Thu Dec 15 12:27:01 JST [midst03-02: disk_admin: disk.partner.diskUnfail:info]: The partner has unfailed 0d.14.9.