CONTAP-403485: 2n MetroCluster の障害ドライブが原因でノードがパニック状態になり、スイッチオーバー(AUSO)が失敗
問題
- 単一ディスクで報告されたscsi.cmdエラーによりノードがパニック状態になりました:
[?] Fri Feb 21 06:00:15 +0100 [Cluster1-01: slifc_intrd: scsi.cmd.checkCondition.bridge.SAS.event:debug]: Disk device switchA1:9.126L1524: Check Condition: CDB 0x2f:b8192800:0400: Sense Data SCSI:aborted command - (0xb - 0x88 0x4 0x81)(2041).
[?] Fri Feb 21 06:00:17 +0100 [Cluster1-01: slifc_intrd: scsi.cmd.checkCondition.bridge.SAS.topoChange:debug]: Disk device switchA2:8.126L1524: Check Condition: CDB 0x2f:b8192000:0400: Sense Data SCSI:aborted command - (0xb - 0x88 0x3 0x1)(4276).
[?] Fri Feb 21 06:20:26 +0100 [Cluster1-01: drsom_watch: sk.panic:alert]: Panic String: DR DRSOM_SSTBL_OP_SWITCHOVER Operation is Hung in drdom_SO in SK process drsom_watch on release 9.13.1P9 (C)
- その結果、AUSO(automated switchover)が失敗し、DRパートナーもパニックになります:
[?] Fri Feb 21 06:52:37 +0100 [Cluster2-01: send_boot_msg_thread: mgr.stack.string:notice]: Panic string: DR DRSOM_SSTBL_OP_SWITCHOVER Operation is Hung in drdom_SO in SK process drsom_watch on release 9.13.1P9 (C)