VMware ESXiでNFS APDの問題をトラブルシューティングする方法
環境
- ONTAP 9
- VMware ESXi
- NFS
概要
- All Paths Down(APD)タイマーは、特定のTCPストリームで通信のない期間が5秒間続いたときに開始されます
- この状態が140秒続くと、接続は失われたと見なされ、APDタイムアウトに達します
- VMware ログに表示される可能性のあるエラーには、次のようなものがありますが、これらに限定されません:
YYYY-MM-DD T00:26:51.504Z: [APDCorrelator] xxxxxxxxxxxxxus: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [xxxxxxxx-xxxxxxxx] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
NFSLock: 608: Stop accessing fd 0x410011446d28 3
NFS: 133: Lost connection to the server 192.168.0.1 mount point /vol/datastore, mounted as xxxxxxxxx-xxxxxxxx-0000-000000000000 (“datastore”)
StorageApdHandler: 248: APD Timer started for ident [xxxxxxxx-xxxxxxxx]
StorageApdHandler: 846: APD Start for ident [xxxxxxxx-xxxxxxxx]!
StorageApdHandler: 277: APD Timer killed for ident [xxxxxxxx-xxxxxxxx]
StorageApdHandler: 902: APD Exit
[vob.storage.apd.start] Device or filesystem with identifier [xxxxxx-xxxxxxx] has entered the All Paths Down stateWARNING: NFS: xxx: Lost connection to the server xxxx mount point /xxxx, mounted as xxxxxx-xxxxxx-0000-000000000000 ("xxxxxxxx")- ONTAPはAPDの時点でログに記録される場合があります
kernel: Nblade.nfsConnResetAndClose:error]: Shutting down connection with the client. Vserver ID is xx; network data protocol is NFS, Rpc Xid xxxxx; client IP address:port is x.x.x.x:xxxx. local IP address is x.x.x.x; reason is CSM error - Maximum number of rewind attempts has been exceededERROR Nblade.CallbackTimedOut: SM NOTIFY: Vserver xx, Vif xxx: PORTMAP program (Program number:100000 Program version:2) on client x.x.x.x is not responding