A400 HAインターコネクトポートが停止しています
環境
- AFF A400
- HAインターコネクト接続では、e0aポートとe0bポートを経由します。
問題
- HAインターコネクトポートがオフライン(両方のノード)。例:
[NODE-A: kernel: netif.linkDown:info]: Ethernet e0a: Link down, check cable.
[NODE-A: kernel: netif.linkDown:info]: Ethernet e0b: Link down, check cable.
[NODE-A: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of NODE-A by NODE-B disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).
[NODE-A: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of NODE-B disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).
[NODE-A: monitor: monitor.globalStatus.critical:EMERGENCY]: Controller failover of NODE-B is not possible: HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support. There are not enough spare disks.
[NODE-A: statd: cf.takeover.disabled:alert]: HA mode, but takeover of partner is disabled due to reason : HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support.
[NODE-A: statd: ic.HAInterconnectDown:error]: HA interconnect: Interconnect down for 15 minutes: all links are down
[NODE-A: statd: callhome.hainterconnect.down:alert]: Call home for HA INTERCONNECT DOWN due to all links are down.
[NODE-B: kernel: netif.linkDown:info]: Ethernet e0a: Link down, check cable.
[NODE-B: kernel: netif.linkDown:info]: Ethernet e0b: Link down, check cable.
[NODE-B: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of NODE-B by NODE-A disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).
[NODE-B: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of NODE-A disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).
[NODE-B: monitor: monitor.globalStatus.critical:EMERGENCY]: Controller failover of NODE-A is not possible: HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support. There are not enough spare disks.
[NODE-B: statd: cf.takeover.disabled:alert]: HA mode, but takeover of partner is disabled due to reason : HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support.
[NODE-B: statd: ic.HAInterconnectDown:error]: HA interconnect: Interconnect down for 15 minutes: all links are down
[NODE-B: statd: callhome.hainterconnect.down:alert]: Call home for HA INTERCONNECT DOWN due to all links are down.
- 両方のノードのSFP情報が
::> system node run -node * -command sysconfig -a
出力に表示されますが、ポートは停止しています。
ノードA
slot 0: Dual 10G/25G Ethernet Controller CX5
e0a MAC Address: d0:39:ea:2b:c3:31 (auto-unknown-fd-down)
SFP Vendor: Molex
SFP Part Number: 1111455002
SFP Serial Number: 93A2023020111
e0b MAC Address: d0:39:ea:2b:c3:32 (auto-unknown-fd-down)
SFP Vendor: Molex
SFP Part Number: 1111455002
SFP Serial Number: 93A2012020222
Device Type: CX5 PSID(NAP0000000006)
Firmware Version: 16.26.4012
ノードB
slot 0: Dual 10G/25G Ethernet Controller CX5
e0a MAC Address: d0:39:ea:2b:c2:c1 (auto-unknown-fd-down)
SFP Vendor: Molex
SFP Part Number: 1111455002
SFP Serial Number: 93A2023020111
e0b MAC Address: d0:39:ea:2b:c2:c2 (auto-unknown-fd-down)
SFP Vendor: Molex
SFP Part Number: 1111455002
SFP Serial Number: 93A2012020222
Device Type: CX5 PSID(NAP0000000006)
Firmware Version: 16.26.4012
::*> system ha interconnect status show
""が表示されlink 0
、link 1
ダウンステータスになっています。例:
::> set advanced
::*> system ha interconnect status show
Node:node-1
Link 0 Status: down
Link 1 Status: down
Is Link 0 Active: false
Is Link 1 Active: false
IC RDMA Connection: up
Node:node-2
Link 0 Status: down
Link 1 Status: down
Is Link 0 Active: false
Is Link 1 Active: false
IC RDMA Connection: up
2 entries were
displayed.
- 問題は、それぞれのHAインターコネクトSFPとケーブルを接続し直したあとも残ります。
- ループバックテスト: 同じノードのHAインターコネクトポートを使用すると、両方のポートがオンラインに戻ります。
- 両方のノードはオンラインのままですが、テイクオーバーはありません。
::*> storage failover show -node *
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
node-1 node-2 false Waiting for node-2, Takeover is
not possible: Storage failover
interconnect error, NVRAM log not
synchronized
node-2 node-1 false Waiting for node-1, Takeover is
not possible: Storage failover
interconnect error, NVRAM log not
synchronized
2 entries were displayed.