メインコンテンツまでスキップ

NFSセッションが停止し、高レイテンシが報告される

Views:
20
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
core<a>2009679095</a>
Last Updated:

環境

  • ONTAP 9
  • FabricPool

問題

  • クラスタ内のFabricPoolで設定されたすべてのボリュームについて、NFSセッションがハングした問題が報告されました。
  • クラスタ内のいずれかのノードで高レイテンシが報告された時点から開始します。
  • EMSログにnblade_execsOverLimit_1およびnblade.nfsLongRunningOpエラーが表示される
    [Node1: kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation. The client IP address:port is 92.X.X.66:694. The local IP address:port is 10.X.X.82:2049. The protocol requesting the operation is NFS3. The RPC program number for the operation is 100003. The protocol procedure for the operation is Read (6). The disk process UUID is 05238d4dXXXXXXXXXXXXX160cedebc32. The Vserver associated with the operation is XXXX. The UID of the user is 23068. The MSID for the volume is 2161146647. The inode number of the file is 12955.

    [Node1: kernel: nblade_execsOverLimit_1:debug]: params: {'clientIpAddress': '10.X.X.58', 'lifIpAddress': '10.X.X.64', 'vserverId': '4', 'execsLimit': '128'}
  • 該当するノードのテイクオーバー/ギブバックを実行しようとすると、以下の文字列でパニック状態になることがあります。
    RPANIC:giveback or arl hung in wafl while doing SENDHOME_DOING_COMMIT in SK process sendhome_hang_detector on release 9.8P19 (C)
  • 影響を受けるノードの/GBへのポストで問題が再実行されることはありません。
  • SktraceログがクラウドIOエラーを示している

    [5:0] CLOUD_BIN_ERR:  cio_error_to_raid_error: Cloud-bin read block 35286487738791  data unavailable cloud io error 9 btid: 8969343 btuuid: cab5f25b-3425-476f-a361-11a69e7db847, seq_num: 1241209
    [13:0] CLOUD_BIN_ERR:  cio_error_to_raid_error: Cloud-bin read block 35277266573844  data unavailable cloud io error 9 btid: 40852388 btuuid: f23e6bae-2ef0-4168-b611-3e3d87274447, seq_num: 1637591
    [13:0] CLOUD_BIN_ERR:  cio_error_to_raid_error: Cloud-bin read block 35284330697390  data unavailable cloud io error 9 btid: 40183670 btuuid: d06a9d55-46c3-473c-b06f-0c6091fa3b02, seq_num: 171567
  • object store showコマンドによる使用可能な情報の表示

    cluster::> storage aggregate object-store show
    Aggregate      Object Store Name Availability
    -------------- ----------------- -------------
    aggr1          s3_bucket         available
  • storage aggregate object-store profiler startコマンドでは、PUTには0個のエラーがありますが、すべてのGETには合計実行と同じエラーが表示されます。

    Object store config name: s3_bucket
    Node name: Node1
    Status: Done
    Start time: 8/2/2023 15:08:38
    Op      Size       Total     Failed             Latency(ms)          Throughput
                                            min       max       avg
    -------------------------------------------------------------------------------
    PUT     4MB        1041      0         91        17799     2891      66.98MB
    GET     4KB        77095     270       5         35501     94        4.28MB
    GET     8KB       284      284       10003     35502     23920     0B
    GET     32KB      297      297       10000     35000     22532     0B
    GET     256KB     285      285       9999      33006     22843     0B
    5 entries were displayed.
  • StorageGRIDが大容量階層として設定されています。
  • StorageGRIDノードで問題は発生していません。 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.