長期間のリプレイキャッシュバケットが不足しているため、クライアントの極端な遅延やハングが発生します

最後の更新
PDFとして保存

Views:: 86

Visibility:: Public

Votes:: 0

Category:: ontap-9

Specialty:: nfs

Last Updated:

に適用されます

ONTAP 8.3.x 以降
NFS

問題

ONTAP の統計に基づいて、レイテンシの詳細セクションの OPM 、 Grafana 、 Perfstat / PerfArchive から高いレイテンシを確認できます。 in the latency breakdown section. パフォーマンスの監視に使用するツールに応じて、レイテンシの大部分は「 cpu_network 」または「 cluster_interconnect 」から取得されます。

さまざまなログから、次のようなエラーや警告が見つかる可能性があります。

CSM のタイムアウトは、 Request Blade （ Nbladesysctl sysvar.csmsysctl sysvar.csm）によって perfstat ' ' で検出されました。 '' を実行すると、 SystemShell から同じ出力を手動で収集できます。

出力例：

SpinNPSessionInt::timeout): this=0xffffff80085e1028, sessionId=(req=cluster_n01:nblade, rsp=cluster_n02:dblade, uniquifier=00053816b2747090): In last 3974071360 ms, 104 of 2168524218 Ops timed out, 2171533701 started, 0 Ops timed out unsent. 4289664640/0/0 Ops await replies, 0 segs sent, 0 await ACKs

perfstat の受信ノード（ Dbladesysctl sysvar.csmsysctl sysvar.csm）での CSM フロー制御では、「」を実行すると、 SystemShell から同じ出力を手動で収集できます。

出力例： :

SpinNPSessionInt::processSessionFlowcontrolQueue): sess = 0xffffff8007bdf028, sessionId = (req=c55f68b8-7cc0-11e4-84e6-098b9834504d, rsp=cluster_n02:dblade, uniquifier=00053816b2747090), iface = 1, delivered REQUEST pkt = 0xffffff05931fa271 to flow control list

nblade.nfsconnresetandclose - EMS ログから「最大巻き戻し回数を超えました」が見つかりました。

出力例： :

Nblade.nfsConnResetAndClose: Shutting down connection with the client. Vserver ID is xx; network data protocol is NFS; client IP address:port is xx.xx.xx.xx:xxx. local IP address is xx.xx.xx.xx; reason is CSM error - Maximum number of rewind attempts has been exceeded.

「 perfstat 」stats spinnpセクションで観察された SpinNP 遅延が高いことを確認し、繰り返しの間で増加することを確認します。statistics show -object spinnp -rawClusterShell （診断モード）から「」を実行しても、同じ出力を手動で収集できます。

出力例： :

spinnp:spinnp:latency_hist.<1s:2577819 spinnp:spinnp:latency_hist.<2s:7878237 spinnp:spinnp:latency_hist.<4s:6262884 spinnp:spinnp:latency_hist.<6s:1629240 spinnp:spinnp:latency_hist.<8s:307280 spinnp:spinnp:latency_hist.<10s:85273 spinnp:spinnp:latency_hist.<20s:145299 spinnp:spinnp:latency_hist.<30s:51447 spinnp:spinnp:latency_hist.<60s:30 spinnp:spinnp:latency_hist.<90s:10 spinnp:spinnp:latency_hist.<120s:6 spinnp:spinnp:latency_hist.>120s:50

Spinhi 統計は、 Spinhi 要求spinhi_stats'spinhi_statsのほとんどが保留キューに登録されていることを示します。これは perfstat セクションに記載されているか、または診断モードでを実行してノードシェルから手動で収集できます。

出力例： :

(spinhi_stats) size=39502 total_req=421874001827 cur_req=25780 max_req=26702 total_resp=421873962781 total_replay_resp=289138 defer_req=55765 cur_defer=25780 max_defer=25780 hipri=15603269 unmarshal_errs=0 marshal_errs=0 fastpath_null_resps=0 cur_nogrow_filecb_bulk=0, cur_nogrow_filecb_op=0 redo=131995, max_nogrow_filecb_bulk=0 max_nogrow_filecb_fileop=0 Access: count=44862084546 hipri=0 errs=77411717 elapsed: max=14087030.76 avg=280.45

cur_req: Current number of requests in SpinHi cur_defer: Current number of requests in SpinHi Defer Queue If cur_defer == cur_req, that means, all the current requests at Spinhi are in the Defer Queue Counter "spinnp_replay_max_long_term_hit" increments across iterations in a perfstat section 'stats spinnp_replay_cache', for example: spinnp_replay_cache:spinnp_replay_cache:spinnp_replay_max_long_term_hit:20467472 spinnp_replay_max_long_term_hit: Total number of times max long term limit was hit"