FAS2820でHA downとIRDMAメッセージが連続して表示される
環境
- FAS2820
- スイッチレスクラスタ
問題
- HA(High Availability)接続がダウンし ており、 次のメッセージがスパム送信されている
EMS
ロク
nvmm_mirror_sync: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_LAYOUT_SYNCING is aborted because of reason NVPM_ERR_MSG_SEND_FAILED.
nvmm_error: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_OFFLINE is aborted because of reason NVMM_ABORT_SYNCING_MIRROR.
cfdisk_config: cf.diskinventory.sendFailed:debug]: params: {'reason': 'HA Interconnect down', 'errorCode': '0'}
- 入力
SP-LATEST-CONSOLE-LOGS
e0b:irdma_process_aeq:390 ERR AEQ: abnormal ae_id = 0x50a (Connection error: The max number of retries has been reached), is_qp = 1, qp_id = 4431, ae_source = 5
e0a:irdma_process_aeq:390 ERR AEQ: abnormal ae_id = 0x103 (Invalid memory key (L-Key/R-Key)), is_qp = 1, qp_id = 4241, ae_source = 9
e0b:irdma_process_aeq:390 ERR AEQ: abnormal ae_id = 0x208 (QP error: Invalid operation detected by the remote peer), is_qp = 1e0a:irdma_process_aeq:390 ERR AEQ: abnormal ae_id = 0x208 (QP error: Invalid operation detected by the remote peer), is_qp = 1, qp_id = 4549, ae_source = 5
e0a:irdma_process_aeq:380 ERR abnormal ae_id = 0x50a bool qp=1 qp_id = 36 ae_source=5
e0b:irdma_process_aeq:380 ERR abnormal ae_id = 0x50a bool qp=1 qp_id = 47 ae_source=5
e0a:irdma_process_aeq:380 ERR abnormal ae_id = 0x50a bool qp=1 qp_id = 46 ae_source=5
HA-INTERCONNECT-STATUS
Link Status
Link 0 Status up
Link 1 Status up
IC RDMA Connection down
Is Link 0 Active true
Is Link 1 Active true