FCVIエラーによる高レイテンシ
- Views:
- 1
- Visibility:
- Public
- Votes:
- 0
- Category:
- metrocluster
- Specialty:
- metrocluster<a>2009786804</a>
- Last Updated:
環境
- ONTAP 9
- MCC-FC
問題
- 両方のサイトのノードの問題のレイテンシが高くなっています。
- ノードSITEB-NODE-AのEMSがPCIeエラーを報告します。
Sat Oct 07 00:30:15 +0800 [SITEB-NODE-A: HSWL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'LVMR,1_0.PLX PCIE 8780 switch on Controller, PLX PCIE 8780 switch on Controller, PLX PCIE 8764 switch in slot 6 on Controller. IIO0:RPT(0,3,0): Br[8780](56,16,0): RcvErr(P17(255)), Br[8780](56,17,0): BadTLP(262804), BadDLLP(860367); Br[8780](56,17,0): DevStatus(Corr), CorrErr(Rcvr,BTLP,BDLLP,RNRov,RpTim); Br[8764](94,0,0) in slot 6: DevStatus(Corr), CorrErr(BTLP,RNRov,RpTim), BadTLP(1). '}
Sat Oct 07 00:32:15 +0800 [SITEB-NODE-A: HSWL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'LVMR,1_0.PLX PCIE 8780 switch on Controller, PLX PCIE 8780 switch on Controller, PLX PCIE 8764 switch in slot 6 on Controller. IIO0:RPT(0,3,0): Br[8780](56,16,0): RcvErr(P17(255)), Br[8780](56,17,0): BadTLP(295570), BadDLLP(926266); Br[8780](56,17,0): DevStatus(Corr), CorrErr(Rcvr,BTLP,BDLLP,RNRov,RpTim); Br[8764](94,0,0) in slot 6: DevStatus(Corr), CorrErr(RNRov,RpTim). '}
- ノードSITEB-NODE-AのEMSでFCVI接続解除エラーが報告されます。
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: ispfcvi2500_main1: fcvi.qlgc.received.disconnect:debug]: FC-VI adapter: Disconnect request received on port 5a. QP name = WAFL, QP index = 9, Remote node's system id = 537415743.
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: ispfcvi2500_main3: fcvi.qlgc.received.disconnect:debug]: FC-VI adapter: Disconnect request received on port 5c. QP name = WAFL, QP index = 3, Remote node's system id = 537415743.
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: fcvi_cm: ic.rdma.qpDisconnected:debug]: WAFL is disconnected.
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: fcvi_cm: ic.rdma.qpConnected:debug]: WAFL is connected.
Sat Oct 07 10:43:52 +0800 [SITEB-NODE-A: ispfcvi2500_main1: fcvi.qlgc.ioErr:debug]: FC-VI adapter: FCVI driver on port 5a received IO error. Status = FW detected response error(status code = 0x121), FCVI opcode = Write Request(0x1), QP name = WAFL, QP index = 9, Remote node's system id = 537415743.
- MCCのノードのEMSでQPエラーが報告される。
07 Oct 2023 10:43:37 [SITEA-NODE-A: error] ispfcvi2500_main1 fcvi qlgc ioErr: port="5a" status_str="Invalid VI state" status="0x10c" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="4" systemID="537416369"
07 Oct 2023 10:43:37 [SITEA-NODE-A: error] ispfcvi2500_main3 fcvi qlgc ioErr: port="5c" status_str="Invalid VI state" status="0x10c" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="4" systemID="537416369"
07 Oct 2023 10:44:18 [SITEA-NODE-A: error] wafl_exempt03 fcvi qlgc ioErr: port="5b" status_str="Request timed out" status="0x104" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="3" systemID="537416369"
07 Oct 2023 10:43:37 [SITEA-NODE-B: error] ispfcvi2500_main3 fcvi qlgc ioErr: port="5c" status_str="Invalid VI state" status="0x10c" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="4" systemID="537416379"
07 Oct 2023 10:43:51 [SITEA-NODE-B: error] ispfcvi2500_main1 fcvi qlgc qpErr: port="5a" qpname="WAFL" qpnum="0x3" state_str="Error" state="0x3" suberror="Mismatch in data relative offset" code="0x13" system_id="537416379" errcnt="2183" info=""
07 Oct 2023 10:45:28 [SITEA-NODE-B: error] ispfcvi2500_main3 fcvi qlgc qpErr: port="5c" qpname="WAFL" qpnum="0x4" state_str="Error" state="0x3" suberror="Mismatch in data relative offset" code="0x13" system_id="537416379" errcnt="5" info=""
07 Oct 2023 10:43:52 [SITEB-NODE-A: error] ispfcvi2500_main1 fcvi qlgc ioErr: port="5a" status_str="FW detected response error" status="0x121" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="9" systemID="537415743"
07 Oct 2023 10:44:00 [SITEB-NODE-A: error] ispfcvi2500_main4 fcvi qlgc qpErr: port="5d" qpname="MISC" qpnum="0x4" state_str="Error" state="0x3" suberror="Transport error on transmit path" code="0x5" system_id="537415743" errcnt="202" info=""
07 Oct 2023 10:44:00 [SITEB-NODE-A: error] wafl_exempt04 fcvi qlgc ioErr: port="5d" status_str="Request timed out" status="0x104" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="2" systemID="537415743"
07 Oct 2023 10:44:08 [SITEB-NODE-A: error] ispfcvi2500_main2 fcvi qlgc qpErr: port="5b" qpname="RAID" qpnum="0x3" state_str="Error" state="0x3" suberror="Timeout occured on the QP exchange" code="0xe" system_id="537415743" errcnt="199" info=""
07 Oct 2023 10:44:08 [SITEB-NODE-A: error] wafl_exempt06 fcvi qlgc ioErr: port="5b" status_str="Request timed out" status="0x104" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="2" systemID="537415743"
- MCC内のノードのEMSでNVMM_mirrorエラーも報告されます。
07 Oct 2023 10:44:00 [SITEA-NODE-A: debug] fcvi_cm nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_SYNCING_OTHER" error="NVMM_ERR_LINK_DOWN"
07 Oct 2023 10:44:00 [SITEA-NODE-A: debug] nvmm_error nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_LAYOUT_SYNCING" error="NVMM_ABORT_SYNCING_MIRROR"
07 Oct 2023 10:44:00 [SITEA-NODE-A: debug] nvmm_error nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_OFFLINE" error="NVMM_ABORT_SYNCING_MIRROR"
07 Oct 2023 10:44:06 [SITEA-NODE-A: notice] ispfcvi2500_main3 fcvi qlgc received disconnect: port="5c" qpname="RAID" qpnum="5" system_id="537416369" info=""
07 Oct 2023 10:44:06 [SITEA-NODE-A: debug] fcvi_cm rdma rlib connected: qp_name="RAID" port="3" client_addr="23.0.1.7" server_addr="23.0.1.5"
07 Oct 2023 10:44:06 [SITEA-NODE-A: debug] nvmm_mirror_sync nvmm mirror state change: partner_sysid="2" partner_type="DR PARTNER" prev_mirror_state="NVMM_MIRROR_LAYOUT_SYNCING" new_mirror_state="NVMM_MIRROR_LAYOUT_SYNCED" state_time="27"
07 Oct 2023 10:44:06 [SITEA-NODE-A: notice] ispfcvi2500_main3 fcvi qlgc received disconnect: port="5c" qpname="WAFL" qpnum="4" system_id="537416369" info=""
07 Oct 2023 10:44:06 [SITEA-NODE-A: debug] fcvi_cm nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_CP2_FINISH" error="NVMM_ERR_LINK_DOWN"
- DWDMリンクは正常です。
- FCVIポートを1つだけ有効にし、他のFCVIポートを無効にしてください。EMSで同じエラーが報告され、高レイテンシの問題が修正されていません。
- すべてのFCVIポートを無効にし、エラーと高レイテンシを修正しました。