AFF A250 / FAS500fでコントローラが予期せずリブートし、パートナーの自動テイクオーバーおよびギブバックが実行される
環境
- FAS500f
- AFF A250
- BMC ファームウェア 15.3 以下
問題
- パートナーからの自動テイクオーバーとギブバックで、予期しないノードがリブートされました。
- ノードがリブートしても、疑いのある EMS メッセージが表示されない。例:
Sun Jan 02 01:25:45 +0200 [node_name-01: config_thread: raid.rg.scrub.summary.lw:notice]: Scrub found 0 RAID write signature inconsistencies in /aggregate/plex0/rg0.
Sun Jan 02 01:43:35 +0200 [node_name-01: kernel: netif.linkUp:info]: Ethernet lo0: Link up.
- BMC イベントと BMC リブート。例:
35d | 01/01/2022 | 10:39:55 | System Event #0xff | Timestamp Clock Sync | Asserted
35e | 01/01/2000 | 00:00:20 | System Event | Timestamp Clock Sync | Asserted
35f | 01/01/2000 | 00:00:20 | System Event #0xff | Timestamp Clock Sync | Asserted
360 | 01/01/2022 | 23:42:54 | System Event #0xff | Timestamp Clock Sync | Asserted
361 | 01/01/2022 | 23:42:54 | System Event | Timestamp Clock Sync | Asserted
362 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
363 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
364 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
365 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
366 | 01/01/2022 | 23:43:10 | Power Supply #0x20 | Presence detected | Asserted
367 | 01/01/2022 | 23:43:10 | Power Supply #0x25 | Presence detected | Asserted
368 | 01/01/2022 | 23:43:14 | Battery #0x4f | State Deasserted
369 | 01/01/2022 | 23:45:00 | System Event #0xff | Timestamp Clock Sync | Asserted
- パートナーテイクオーバーメッセージです。例:
Sun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.partnerNotResponding:notice]: Failover monitor: partner not responding
Sun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.takeoverCountdown:info]: Failover monitor: takeover scheduled in 10 seconds
Sun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node_name-02 by netapp03-06 disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).
Sun Jan 02 01:41:49 +0200 [node_name-02: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.
Sun Jan 02 01:41:49 +0200 [node_name-02: cf_takeover: cf.fm.takeoverStarted:notice]: Failover monitor: takeover started