メインコンテンツへスキップ

複数のディスクでノードが停止しました「scsi.cmd.pastTimeToLive:error」

Views:
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
HW
Last Updated:

環境

  • FAS 2820
  • ONTAP 9
  • 内蔵シェルフ

問題

  • ノードが停止し、複数のディスクscsi.cmd.pastTimeToLive:erro r エラーが発生しました。

[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8a:000000046cd85e00:00000200.
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8a:000000047237f760:00000008.
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8f:000000046c3c7e00:00000400.
...
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.8: request failed after try #1: cdb 0x88:000000047237ef90:00000008.

  • パートナーノードHA Group Notification (CONTROLLER TAKEOVER COMPLETE AUTOMATIC - Communiction Error) ALERT
    • 次のEMSログが検出されました。

[?] Sat Dec 28 08:48:01 +0900 [node02: cf_main: cf.fsm.takeover.mdp:alert]: Failover monitor: takeover attempted after multi-disk failure on partner

  • シェルフのIOMポートの状態 NO SIGNAL

Timestamp: Sat Jan 4 08:33:20 JST 2025
Shelf name: 0c.shelf0
Channel: 0c
Module: A
Shelf id: 0
Shelf UUID: 50:0a:09:80:08:6f:fb:24
Shelf S/N: SHJSG2418000037
Term switch: N/A
Shelf state: ONLINE
Module state: OK

Partial Path Link Invalid Running Loss Phy CRC Phy
Disk Port Timeout Rate DWord Disparity Dword Reset Error Change
Id State Value (ms) (Gb/s) Count Count Count Problem Count Count
--------------------------------------------------------------------------------------------
[HST0/P0:0] NO SIGNAL 7 NA 0 0 0 0 0 974
[HST1/P0:1] NO SIGNAL 7 NA 1299 1298 0 0 0 974
[HST2/P0:2] NO SIGNAL 7 NA 310 307 0 0 0 974
[HST3/P0:3] NO SIGNAL 7 NA 85 81 0 0 0 974
[HST4/P1:0] OK 7 12.0 0 0 0 0 0 3
[HST5/P1:1] OK 7 12.0 0 0 0 0 0 3
[HST6/P1:2] OK 7 12.0 0 0 0 0 0 3

  • 複数のドライブがノードによって読み取られず、アグリゲートが失敗するmulti-disk error
    Mon Jun 02 10:17:22 +0700 [node-02: config_thread: raid.vol.failed:notice]: Aggregate aggr1_n2: Failed due to multi-disk error.
    Mon Jun 02 10:17:23 +0700 [node-02: config_thread: cf.multidisk.fatalProblem:error]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr aggr1_n2: raid volfsm, fatal multi-disk error..  Raid type - raid_dp Group name plex0/rg0 state DOUBLEDEGRADED. 1 disk failed in the group. Disk 0a.00.2P1 Shelf 0 Bay 2 [NETAPP   X336_TTCRE04TA07 NA04] S/N [Y3F0A2XXXXXX] UID [6000039C:E82AC314:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000] error: disk failed..

     
  • ノードがダウンするのは multi-disk failure
    Mon Jun 02 10:17:23 +0700 [node-02: cf_main: cf.fsm.takeover.mdp:alert]: Failover monitor: takeover attempted after multi-disk failure on partner

     

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.