内部エラーが原因でノードのルートボリューム/アグリゲートを新しいディスクに移行できない
環境
問題
- 内部エラーが原因でノードのルートボリューム/アグリゲートを新しいディスクに移行できない
例1:
Wed Jan 05 10:46:28 +0800 [Node_name: mgwd: migrate.root.failed:error]: Root aggregate migration failed on node Node_name. Reason: Internal error. Failed to offline the volume "vol0". Reason: ..
Wed Jan 05 10:46:28 +0800 [Node_name: mgwd: mgmtgwd.jobmgr.jobcomplete.failure:info]: Job "Migrate root aggregate" [id 4315] (Root aggregate migration job for node Node_name) completed unsuccessfully: Internal error. Failed to offline the volume "vol0". Reason: . (1).
例2:
Execution Progress: Complete: Internal error. Failed to verify the new root aggregate status.
例3:
Execution Progress: Complete: Internal error. Failed to copy contents from old root to new root volume.
例4:
8/19/2024 10:54:37 Cluster-01 INFORMATIONAL mgmtgwd.jobmgr.jobcomplete.failure: Job "Migrate root aggregate" [id 589591] (Root aggregate migration job for node Node-01) completed unsuccessfully: Internal error. Failed to destroy the volume "vol0". Reason: . (1).
8/19/2024 10:54:37 Cluster-01 ERROR migrate.root.failed: Root aggregate migration failed on node Node-01. Reason: Internal error. Failed to destroy the volume "vol0". Reason: ..
Execution Progress: Complete: Internal error. Failed to rename the new root aggregate. Reason: . [1]
実行の進捗状況:Complete:Timeout:処理「copy_root_volume_contents_iterator::create_imp()」が完了するまでに600秒以上かかった
- 新しいルートアグリゲートが作成されてノードは正常に動作しているが、ルート移行ジョブを再開できない
- 移行ジョブを再開しようとすると、次のエラーで失敗します。
Internal error. Failed to verify the new root aggregate status.