AFF A320:スイッチクラスタで頻繁なテイクオーバーが無効(非同期ログ)のアラートが生成されます
環境
- ONTAP 9
- AFF A320
- スイッチクラスタ
- テイクオーバー無効(非同期ログ)
問題
- クラスタポートとスイッチポートの間のエンドツーエンド接続でハードウェアエラーは発生しません。
- EMSログレポート:
node-01:
Mon Aug 22 11:05:41 -0600 [node-01: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of node-02 disabled (unsynchronized log).
Mon Aug 22 11:05:43 -0600 [node-01: ib_cm_14: rdma.rlib.connected:debug]: misc:HA:P QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-01: ib_cm_6: rdma.rlib.connected:debug]: wafl:HA:P QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-01: ib_cm_3: rdma.rlib.connected:debug]: raid:HA:P QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-01: ib_cm_8: rdma.rlib.connected:debug]: misc:HA:P QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-01: nvmm_helper: nvpm.state.changed:debug]: Node 1's NVPM state changed from "2" to "2".
Mon Aug 22 11:05:45 -0600 [node-01: ib_cm_0: rdma.rlib.connected:debug]: wafl:HA:P QP is now connected.
Mon Aug 22 11:05:45 -0600 [node-01: ib_cm_10: rdma.rlib.connected:debug]: raid:HA:P QP is now connected.
Mon Aug 22 11:05:46 -0600 [node-01: cf_main: cf.fsm.takeoverOfPartnerEnabled:notice]: Failover monitor: takeover of node-02 enabled
node-02:
Mon Aug 22 11:05:41 -0600 [node-02: raidio_thread: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_ONLINE is aborted because of reason NVMM_ERR_NO_REQS.
Mon Aug 22 11:05:41 -0600 [node-02: raidio_thread: mirror.stream.qp.error:debug]: params: {'error': 'NVMM_ERR_NO_REQS', 'qp_name': 'RAID', 'mirror': 'HA Partner'}
Mon Aug 22 11:05:41 -0600 [node-02: mcc_cfd_rnic: mirror.stream.qp.error:debug]: params: {'error': 'NVMM_ERR_STREAM', 'qp_name': 'MISC', 'mirror': 'HA Partner'}
Mon Aug 22 11:05:41 -0600 [node-02: nvmm_error: rdma.rlib.event.error:debug]: QP wafl event error: client disconnect.
Mon Aug 22 11:05:41 -0600 [node-02: nvmm_error: nvmm.mirror.offlined:debug]: params: {'mirror': 'HA_PARTNER'}
Mon Aug 22 11:05:41 -0600 [node-02: rastrace_dump: rastrace.dump.saved:debug]: A RAS trace dump for module IC instance 0 was stored in /etc/log/rastrace/IC_0_20220822_11:05:41:084487.dmp.
Mon Aug 22 11:05:41 -0600 [node-02: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node-02 by node-01 disabled (unsynchronized log).
Mon Aug 22 11:05:43 -0600 [node-02: ib_cm_18: rdma.rlib.connected:debug]: misc:HA:A QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-02: ib_cm_10: rdma.rlib.connected:debug]: wafl:HA:A QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-02: ib_cm_9: rdma.rlib.connected:debug]: raid:HA:A QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-02: ib_cm_13: rdma.rlib.connected:debug]: misc:HA:A QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCING to NVMM_MIRROR_LAYOUT_SYNCED and took 1 msecs.
Mon Aug 22 11:05:43 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCED to NVMM_MIRROR_SYNCING_START and took 0 msecs.
Mon Aug 22 11:05:43 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_SYNCING_START to NVMM_MIRROR_CP1_START and took 25 msecs.
Mon Aug 22 11:05:43 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_CP1_START to NVMM_MIRROR_WAFL_INIT and took 464 msecs.
Mon Aug 22 11:05:43 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_WAFL_INIT to NVMM_MIRROR_CP2_FINISH and took 24 msecs.
Mon Aug 22 11:05:45 -0600 [node-02: ib_cm_15: rdma.rlib.connected:debug]: wafl:HA:A QP is now connected.
Mon Aug 22 11:05:45 -0600 [node-02: ib_cm_8: rdma.rlib.connected:debug]: raid:HA:A QP is now connected.
Mon Aug 22 11:05:46 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_CP2_FINISH to NVMM_MIRROR_WAFL_HEADER and took 2339 msecs.
Mon Aug 22 11:05:46 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_WAFL_HEADER to NVMM_MIRROR_SYNCING_OTHER and took 12 msecs.
Mon Aug 22 11:05:46 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_SYNCING_OTHER to NVMM_MIRROR_ONLINE and took 288 msecs.
Mon Aug 22 11:05:46 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.onlined:debug]: params: {'mirror': 'HA_PARTNER'}
Mon Aug 22 11:05:46 -0600 [node-02: cf_main: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of node-02 by node-01 enabled
- 一方のノードでHAパートナーとのIC通信に問題があるため、IC接続の切断と再確立を余儀なくされています。データの整合性を維持するために、ノードは「unsynchronized log」状態になります。