CVO でディスクが見つからずシステムがパニックに陥る

最後の更新
PDFとして保存

Views:: 15

Visibility:: Public

Votes:: 0

Category:: not set

Specialty:: ds_cvo

Last Updated:

環境

Cloud Volumes ONTAP（CVO）
Blue XP（旧称 Cloud Manager）
Microsoft Azure
Amazon Web Services（AWS）
Google Cloud Platform（GCP）
シングルノードまたはHAペア

問題

基盤インフラストラクチャの問題により 1 つ以上のディスクにアクセスできなくなり、パニックが発生します：

[Cluster-01: pha_remove000: mlm.array.lun.removed:notice]: Array LUN '0b.29' (00000000i3g268fHE60S) is no longer being presented to this node.

[Cluster-01: dmgr_thread: raid.disk.missing:info]: Disk /aggr04/plex0/rg0/0b.29 S/N [00000000i3g268fHE60S] UID [00000000i3g268fHE60S] is missing from the system

[Cluster-01: config_thread: sk.panic:alert]: Panic String: aggr aggr04: raid volfsm, fatal disk error in RAID group with no parity disk.. Raid type - raid0 Group name plex0/rg0 state NORMAL. 1 disk failed in the group. Disk 0b.29 S/N [00000000i3g268fHE60S] UID [00000000i3g268fHE60S] error: disk does not exist. in SK process config_thread on release 9.7P7 (C)

[Cluster-01: config_thread: sk.panic:alert]: params: {'reason': 'aggr aggr04: raid volfsm, fatal disk error in RAID group with no parity disk.. Raid type - raid0 Group name plex0/rg0 state NORMAL. 1 disk failed in the group. Disk 0b.29 S/N [00000000i3g268fHE60S] UID [00000000i3g268fHE60S] error: adapter error prevents command from being sent to device. in SK process config_thread on release 9.7P7 (C)'}

状況によっては、システムがWAFL Hungパニックでパニックに陥る場合があります：
Panic String: WAFL hung for aggr1. in SK process wafl_exempt02 on release 9.9.0 (C)
AWS/GCP ではプレックス障害が発生し、ノードが「unknow」ステータスに戻る場合があります。

SYMPFA:HA Group Notification from Node-02 (SYNCMIRROR PLEX FAILED) ALERT

Azure では、ディスク（Azure HA ルート/データアグリゲートの場合はページ BLOB）にアクセスできない場合、パニックが発生する可能性があります。

Thu Nov 20 22:06:40 -0500 [Cluster-01: rc: sk.panic:alert]: Panic String: DIAGNOSTIC PANIC Disk deleted or missing on cloud shared HA in SK process rc on release 9.16.1P8 (C)

HA Group Notification (PARTNER DOWN, TAKEOVER IMPOSSIBLE ) EMERGENCYアラートによりサポートケースが自動的に作成される可能性があります