クラスタサービスに影響を与えたブロードキャストストームが原因でシステムが停止しました
環境
- AFF A800
- ONTAP 9
- ブロードキャストまたはマルチキャストトラフィック
問題
- Vifmgrが断続的なOOQになる(クォーラムのメンバーでない)
VIFMGR.GZ
[kern_vifmgr:info:7439] [0x81093ad00] [NbladeWriter::nitroPcpRpcCall] long-running operation: procNum=35; time=3101 ms
[kern_vifmgr:info:7439] [0x80e591d00] [NbladeWriter::nitroPcpRpcCall] long-running operation: procNum=32; time=3109 ms
[kern_vifmgr:info:7439] A [src/rdb/quorum/qm_states/inq/SecondaryState.cc 146 (0x80c12f100)]: doWork: Leaving Quorum at 5364558s; membership expired at 5364558s - no poll received from Master since 5364540s [membershipDisabled: false]
[kern_vifmgr:info:7439] A [src/rdb/quorum/qm_states/inq/SecondaryState.cc 306 (0x80c12f100)]: secondaryFailed: FastPathDefault 1, Membership terminated by secondaryFailed call at 5364558s, _failedTillTime 5364561s
[kern_vifmgr:info:7439] A [src/rdb/quorum/qm_states/inq/QuorumMemberState.cc 65 (0x80c12f100)]: state2: WS_QuorumMember -> WS_Failed
[kern_vifmgr:info:7439] A [src/rdb/quorum/qm_states/inq/SecondaryState.cc 326 (0x80c12f100)]: stateUp2Secondary: WS_QuorumMember -> WS_Failed
[kern_vifmgr:info:7439] A [src/rdb/quorum/qm_states/qm_state.cc 301 (0x80c12f100)]: qmsPreferredCandidate_set: QmState::qmsPreferredCandidate_set till: 5364561s who: 1006.
[kern_vifmgr:info:7439] A [src/rdb/quorum/qm_states/inq/InQuorumState.cc 50 (0x80c12f100)]: stateUp2InQuorum: WS_QuorumMember -> WS_Failed
[kern_vifmgr:info:7439] A [src/rdb/quorum/quorumimpl.cc 1990 (0x80c12f100)]: local_offlineUpcall: local_offlineUpcall QM Upcall status: Secondary ==> Offline Epoch: 253 => 253 isFastPath 1 isFastPathOverride 0 membershipDisabled: 0
[kern_vifmgr:info:7439] A [src/rdb/quorum/qm_states/qm_state.cc 545 (0x80c12f100)]: stateTrans: QmState::stateTrans: WS_QuorumMember -> WS_Failed at: 5364558s
[kern_vifmgr:info:7439] ******* OOQ mtrace dump BEGIN *********
- 内部SES(SCSI Enclosure Services)へのアクセスが中断されました
EMS-LOG-FILE.GZ
node-01
[node-01: dsa_worker3: ses.status.electronicsWarn:error]: FS4483PSM3E (S/N SHFNC2211000123) shelf 0 on channel 0s environmental monitoring warning for SES electronics 2: communication error. ; enclosure services hardware failed This element is on the rear of the shelf at the bottom, on shelf module (B).
[node-01: dsa_worker3: ses.status.ModuleError:alert]: FS4483PSM3E (S/N SHFNC2211000123) shelf 0 on channel 0s PCI switch error for PCI Switch 2: status not available; status not available. This element is on the rear of the shelf at the bottom, on shelf module (B).
[node-01: dsa_worker3: ses.status.electronicsInfo:info]: FS4483PSM3E (S/N SHFNC2211000123) shelf 0 on channel 0s environmental monitoring information for SES electronics 2: normal status.
[node-01: dsa_worker3: ses.status.ModuleInfo:info]: FS4483PSM3E (S/N SHFNC2211000123) shelf 0 on channel 0s PCI switch information for PCI Switch 2: normal status.
Partner node-02
[node-02: scsi_cmdblk_strthr_admin: scsi.cmd.abortedByHost:error]: Unknown device 0s.0: Command aborted by host adapter: HA status 0x4: cdb 0x12.
[node-02: scsi_cmdblk_strthr_admin: scsi.cmd.selectionTimeout:error]: Unknown device 0s.0: Adapter/target error: HA status 0x7: cdb 0x12. Targeted device did not respond to requested I/O. I/O will be retried.
[node-02: scsi_cmdblk_strthr_admin: scsi.cmd.abortedByHost:error]: Unknown device 0s.0: Command aborted by host adapter: HA status 0x4: cdb 0x12.
[node-02: scsi_cmdblk_strthr_admin: scsi.cmd.selectionTimeout:error]: Unknown device 0s.0: Adapter/target error: HA status 0x7: cdb 0x12. Targeted device did not respond to requested I/O. I/O will be retried.
[node-02: scsi_cmdblk_strthr_admin: scsi.cmd.selectionTimeout:error]: Unknown device 0s.0: Adapter/target error: HA status 0x7: cdb 0x12. Targeted device did not respond to requested I/O. I/O will be retried.