AFF A250 / FAS500fでコントローラが予期せずリブートし、パートナーの自動テイクオーバーおよびギブバックが実行される
環境
- FAS500f
- AFF A250
- BMC ファームウェア 15.3 以下
問題
- パートナーからの自動テイクオーバーとギブバックで、予期しないノードがリブートされました。
- ノードがリブートしても、疑いのある EMS メッセージが表示されない。例:
Sun Jan 02 01:25:45 +0200 [node_name-01: config_thread: raid.rg.scrub.summary.lw:notice]: Scrub found 0 RAID write signature inconsistencies in /aggregate/plex0/rg0.Sun Jan 02 01:43:35 +0200 [node_name-01: kernel: netif.linkUp:info]: Ethernet lo0: Link up.- BMC イベントと BMC リブート。例:
35d | 01/01/2022 | 10:39:55 | System Event #0xff | Timestamp Clock Sync | Asserted
35e | 01/01/2000 | 00:00:20 | System Event | Timestamp Clock Sync | Asserted
35f | 01/01/2000 | 00:00:20 | System Event #0xff | Timestamp Clock Sync | Asserted
360 | 01/01/2022 | 23:42:54 | System Event #0xff | Timestamp Clock Sync | Asserted
361 | 01/01/2022 | 23:42:54 | System Event | Timestamp Clock Sync | Asserted
362 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
363 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
364 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
365 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
366 | 01/01/2022 | 23:43:10 | Power Supply #0x20 | Presence detected | Asserted
367 | 01/01/2022 | 23:43:10 | Power Supply #0x25 | Presence detected | Asserted
368 | 01/01/2022 | 23:43:14 | Battery #0x4f | State Deasserted
369 | 01/01/2022 | 23:45:00 | System Event #0xff | Timestamp Clock Sync | Asserted
- パートナーテイクオーバーメッセージです。例:
Sun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.partnerNotResponding:notice]: Failover monitor: partner not respondingSun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.takeoverCountdown:info]: Failover monitor: takeover scheduled in 10 secondsSun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node_name-02 by netapp03-06 disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).Sun Jan 02 01:41:49 +0200 [node_name-02: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.Sun Jan 02 01:41:49 +0200 [node_name-02: cf_takeover: cf.fm.takeoverStarted:notice]: Failover monitor: takeover started