HDD IO のレイテンシが正しく計算されないと、ディスク障害が発生します
環境
- FAS2720
- ONTAP 9.7p10.
問題
- 短期間に複数のディスクに障害が発生した場合。
- ディスクのレイテンシが平均を超えていることが EMS レポートで報告されるため、 ONTAP で障害が推奨されます。
Wed Nov 10 01:33:37 +0900 [node_name: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0b.00.6: Check Condition: CDB 0x8a:00000006aad4c800:00000200: Sense Data SCSI:aborted command - (0xb - 0x4b 0x6 0x0)(2031).
Wed Nov 10 01:33:37 +0900 [node_name: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 0b.00.6: request successful after retry #1/#0: cdb 0x8a:00000006aad4c800:00000200 (2147).
Wed Nov 10 01:33:48 +0900 [node_name: disk_latency_monitor: shm.threshold.ioLatency:debug]: Disk 0b.00.6 has exceeded the expected IO latency in the current window with average latency of 15122 msecs and average utilization of 47 percent. Highest average IO latency: 0b.00.6: 15122 msecs; next highest IO latency: 0b.00.10: 11 msecs. Disk 0b.00.6 Shelf 0 Bay 6 [NETAPP X388_SEVNE16TA07 NA00] S/N [ZL25117M] UID [5000C500:CA04E0EF:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
Wed Nov 10 01:34:18 +0900 [node_name: disk_latency_monitor: shm.threshold.ioLatency:debug]: Disk 0b.00.6 has exceeded the expected IO latency in the current window with average latency of 15037 msecs and average utilization of 47 percent. Highest average IO latency: 0b.00.6: 15037 msecs; next highest IO latency: 0b.00.10: 11 msecs. Disk 0b.00.6 Shelf 0 Bay 6 [NETAPP X388_SEVNE16TA07 NA00] S/N [ZL25117M] UID [5000C500:CA04E0EF:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
Wed Nov 10 01:34:48 +0900 [node_name: disk_latency_monitor: shm.threshold.ioLatency:debug]: Disk 0b.00.6 has exceeded the expected IO latency in the current window with average latency of 14792 msecs and average utilization of 50 percent. Highest average IO latency: 0b.00.6: 14792 msecs; next highest IO latency: 0b.00.10: 11 msecs. Disk 0b.00.6 Shelf 0 Bay 6 [NETAPP X388_SEVNE16TA07 NA00] S/N [ZL25117M] UID [5000C500:CA04E0EF:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
Wed Nov 10 01:34:48 +0900 [node_name: disk_latency_monitor: shm.threshold.highIOLatency:error]: Disk 0b.00.6 exceeds the average IO latency threshold and will be recommended for failure.
Wed Nov 10 01:34:48 +0900 [node_name: disk_latency_monitor: scsi.debug:debug]: shm_setup_for_failure disk 0b.00.6 (S/N ZL25117M) error 200000h