メインコンテンツまでスキップ

複数ディスク障害が原因でノードが異常リブートする

Views:
44
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

環境

  • SASアダプタ

 

問題

  • パニック文字列が表示されずにシステムが予期せずリブートする
  • 追加の操作なしでテイクオーバーとギブバックが完了
  • システムが複数のディスクにアクセスできなくなり、リブートが発生する

================ Log #1 start time Tue Jul 18 06:07:53 2023
mbx_inst_header_marshal:Error writing to all mailbox disk. mbx_sequencNo= 84496746
================ Log #1 end time Tue Jul 18 06:07:53 2023
================ Log #2 start time Tue Jul 18 06:08:13 2023
BIOS Version: 11.

  • パートナーノードで 不足ディスクが報告されている:

[node_name: cf_main: cf.fsm.takeover.mdp:debug]: Failover monitor: takeover attempted after multi-disk failure on partner

  • テイクオーバーイベントの実行中にノードからマルチディスクエラーが報告されます。

Mon Oct 09 00:08:35  0000 [node-name-1: fmmbx_instanceWorker: cf.multidisk.fatalProblem:debug]: Node encountered a multidisk error or other fatal error while waiting to be taken over. Permanent errors on all HA mailbox disks (while marshalling header).

  • テイクオーバー処理とギブバック処理の実行時にパニック文字列が表示されない
  • SASアダプタのリセットが検出されたため、シェルフとディスクが「見つからない」状態になっています:

[node_name: pmcsas_asyncd_0: sas.adapter.reset:debug]: Resetting SAS adapter 0a.
[node_name: pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'PORT UP -- 0a', 'adapterName': '0a'}
[node_name: pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'PORT UP -- 0b', 'adapterName': '0a'}
[node_name: pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'PORT UP -- 0c', 'adapterName': '0a'}
[node_name: pmcsas_admin_0: sas.adapter.debug:info]: params: {'debug_string': 'PORT UP -- 0d', 'adapterName': '0a'}
[node_name: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'Port 0: disabled 0, up 4, down 0: old state 3 --> new state 3', 'adapterName': '0a'}
[node_name: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'Port 1: disabled 0, up 4, down 0: old state 3 --> new state 3', 'adapterName': '0a'}
[node_name: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'Port 2: disabled 0, up 4, down 0: old state 3 --> new state 3', 'adapterName': '0a'}
[node_name: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'debug_string': 'Port 3: disabled 0, up 4, down 0: old state 3 --> new state 3', 'adapterName': '0a'}
[node_name: fmmbx_instanceWorker: cf.multidisk.fatalProblem:error]: Node encountered a multidisk error or other fatal error while waiting to be taken over. Permanent errors on all HA mailbox disks (while marshalling header).

  • リブートのサービスプロセッサイベント:

Record 705: Mon Oct 09 00:08:55.226699 2023 [BMC.critical]: Filer Reboots
Record 706: Mon Oct 09 00:08:55.247621 2023 [Trap Event.critical]: hwassist abnormal_reboot (28)
Record 707: Mon Oct 09 00:08:58.159727 2023 [IPMI.notice]: 0388 | 02 | EVT: 6fc200ff | System_FW_Status | Assertion Event, "System software has cleanly shut down"

  • パニックおよびフェイルオーバーの前にNFS要求が正しく処理されない
  • パニック時にコアファイルが生成される

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.