「mediator down」および「SyncMirror plex failed」アラートのあとに「reboot(内部リブート)」でノードがリブートする
環境
- IP MetroCluster
- ONTAP 9
- MetroCluster準拠スイッチ (MCCバックエンドおよびその他のトラフィック用の共有スイッチ)
問題
- 以下のAutoSupportがトリガーされてノードがリブートされます
HA Group Notification (MEDIATOR DOWN, AUSO DISABLED) ALERT
HA Group Notification (SYNCMIRROR PLEX FAILED) ALERT
HA Group Notification (REBOOT (internal reboot)) NOTICE
- EMSログには、メディエーターの切断、 転送エラー、 リモートドライブのディスク読み取りリザベーションエラー、テイクオーバー無効エラーの通知が表示されます。
Thu Mar 02 02:35:13 +1100 [node01: geom: geom.ontap.orphan.removing:notice]: Removing unit 0 type 5. Thu Mar 02 02:35:13 +1100 [node01: geom: geom.ontap.orphan.removing:notice]: Removing unit 1 type 5. Thu Mar 02 02:35:13 +1100 [node01: pha_remove000: mlm.array.lun.removed:notice]: Array LUN '0f.1' (3337633537306161) is no longer being presented to this node.
Thu Mar 02 12:17:43 +1100 [node01: disk_admin: disk.readReservationFailed:error]: Disk read reservation failed on 0m.i1.0L14 CDB 0x5e:01 - SCSI:no sense (0 0 0) Thu Mar 02 12:17:43 +1100 [node01: disk_admin: disk.readReservationFailed:error]: Disk read reservation failed on 0m.i1.0L15 CDB 0x5e:01 - SCSI:no sense (0 0 0) Thu Mar 02 12:17:42 +1100 [node01: cf_main: cf.fsm.backupMailboxError:error]: Failover monitor: partner mailbox error detected. Thu Mar 02 12:17:42 +1100 [node01: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of node02 disabled (partner mailbox disks not accessible or invalid). Thu Mar 02 12:17:42 +1100 [node01: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node01 by node02 disabled (unsynchronized log). Thu Mar 02 12:17:43 +1100 [node01: svc_queue_thread: cf.ic.xferTimedOut:error]: HA interconnect: OFW transfer timed out. Thu Mar 02 12:17:43 +1100 [node01: cf_firmware: cf.fm.partnerFwTransition:info]: params: {'progresscounter': '0', 'newstate': 'SF_UNKNOWN', 'prevstate': 'SF_UP'} Thu Mar 02 12:17:42 +1100 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.transportErrorEMSOnly:error]: Disk device 0v.i1.1L42: Transport error during execution of command: HA status 0x9: cdb 0x28:00000008:0008. Thu Mar 02 12:17:42 +1100 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.transportErrorEMSOnly:error]: Disk device 0v.i1.1L79: Transport error during execution of command: HA status 0x9: cdb 0x28:00000008:0008.
- 以下のイベントはリブートの理由としてEMSログに報告されます
Thu Mar 02 12:17:42 +1100 [node01: fmmbx_instanceWorker: kern.shutdown.initiator:debug]: SK reboot was initiated by "maytag.ko::mia_mccip_local_write_partial_fail+568".