CPU枯渇によるNFSサーバの応答停止と広範囲なVM BSOD
環境
- NFS
- 9.15.1 CLUSTER-MODE
問題
- NFSクライアント(RHEL/OpenShift)が「nfs server not responding, still trying」と報告しました
- Windows VMでBSODが発生しました
- EMS/システムログに以下が表示されました:
Sun Nov 02 01:34:14-0500 [dc1h20502:mgwd:rdb.node.starvation:error]: CPU starvation detected in the RDB.Sun Nov 02 01:29:37-0500 [dc1h20502:ksmf_timeout_thread:ksmf.svc.watchdog:debug]: "kSMF service thread held >25(sec) by application for table ksmfRawZapi"Sun Nov 02 01:30:25-0500 [dc1h20502:kernel:Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation. The client IP address:port is 19.14.190.123:719...Sun Nov 02 01:33:40-0500 [dc1h20502:CCMA-Scheduler:perf.ccma.workQ.overrun:debug]: Performance archiver cannot collect objects in a timely manner, for the 1 seconds period.