Active IQ Unified Manager アラート:破損ディスクが原因でアグリゲートのRAIDが再構築中です
環境
- Active IQ Unified Manager
- OnCommand Unified Manager ( UM )
- Data ONTAP 8
問題
- OnCommand Unified Managerがディスクの再構築ステータスを報告します。
-------------------------------------
Alert from OnCommand Unified Manager: Aggregate
Reconstructing
A risk was generated by XXXXXXXXXX that requires your attention.
Risk - Aggregate
Reconstructing
Impact Area - Availability
Severity - Warning
Source - node-1:aggr01
Trigger Condition -
Aggregate aggr01's RAID status is reconstructing because of broken disks -
.
-------------------------------------
- EMSログでは、コマンドのタイムアウトや、そのディスクでの高レイテンシをテストしています。
Thu Dec 15 09:51:06 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 09:51:32 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 09:52:18 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 09:59:14 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 10:29:53 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 10:31:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 60 msecs and average utilization of 37 percent. Highest average IO latency: 0d.14.9: 60 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:31:32 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 77 msecs and average utilization of 46 percent. Highest average IO latency: 0d.14.9: 77 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:32:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 88 msecs and average utilization of 54 percent. Highest average IO latency: 0d.14.9: 88 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:32:32 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 94 msecs and average utilization of 62 percent. Highest average IO latency: 0d.14.9: 94 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:33:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 103 msecs and average utilization of 68 percent. Highest average IO latency: 0d.14.9: 103 msecs; next highest IO latency: 1d.22.5: 9 msecs
Thu Dec 15 10:33:02 JST [node-1: disk_latency_monito: shm.threshold.highIOLatency:error]: Disk 0d.14.9 exceeds the average IO latency threshold and will be recommended for failure.
Thu Dec 15 10:33:03 JST [node-1: config_thread: raid.disk.maint.start:notice]: Disk /aggr_sas_01/plex0/rg2/0d.14.9 Shelf 14 Bay 9 [NETAPP X422_HCOBE600A10 NA00] S/N [XXXXXXXX] will be tested.
Thu Dec 15 10:33:03 JST [node-1: disk_admin: disk.failmsg:error]: Disk 0d.14.9 (XXXXXXXX): exceeded latency threshold.
Thu Dec 15 10:33:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}
Thu Dec 15 10:34:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}
Thu Dec 15 10:35:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}
Thu Dec 15 10:36:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}
- テスト後、ディスクはシステムから障害状態を解除され、スペアプールに移動されました。
[?] Thu Dec 15 12:21:54 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 0d.14.9 invalidate debounce - 40', 'adapterName': '0c'}
[?] Thu Dec 15 12:21:54 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '0c'}
[?] Thu Dec 15 12:21:55 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Device 1a.14.9 invalidate debounce - 40', 'adapterName': '1a'}
[?] Thu Dec 15 12:21:55 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '1a'}
[?] Thu Dec 15 12:21:59 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 0d.14.9 came back.', 'adapterName': '0c'}
[?] Thu Dec 15 12:21:59 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '0c'}
[?] Thu Dec 15 12:22:01 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Device 1a.14.9 came back.', 'adapterName': '1a'}
[?] Thu Dec 15 12:22:01 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '1a'}
[?] Thu Dec 15 12:25:22 JST [midst03-02: api_dpool_04: ems.engine.suppressed:debug]: Event 'od.rdb.mbox.debug' suppressed 4 times in last 897 seconds.
[?] Thu Dec 15 12:25:22 JST [midst03-02: api_dpool_04: od.rdb.mbox.debug:debug]: params: {'message': 'RDB-HA readPSlot: Read blob_type 3, (pslot 0), instance 0.'}
[?] Thu Dec 15 12:25:22 JST [midst03-02: mgwd: rdb.ha.verified:notice]: Verified that cluster high availability (HA) is configured correctly, and that on-disk mailboxes are intact.
[?] Thu Dec 15 12:26:09 JST [midst03-02: mgwd: rdb.ha.verified:notice]: Verified that cluster high availability (HA) is configured correctly, and that on-disk mailboxes are intact.
[?] Thu Dec 15 12:27:01 JST [midst03-02: disk_admin: disk.partner.diskUnfail:info]: The partner has unfailed 0d.14.9.