More data flow occupies by PVS server after NAS server failover

This issue occurred due to oplocks mechanism and SMB protocol limition.

PVS service is different from other SMB applications, it is going to stream vDisk to all target devices as Disk I/O, High-throughput and low-latency NAS storage is required to prevent target device hang issue due to pending disk queue.

PVS uses the SMB(the variant used will be dependent on the OS and the file share is on) protocol to open a file from a CIFS share, SMB 2.1 introduced a feature called leasing which allows multiple handles from the same client to the same file.

When we attempt to stream a vDisk we send an Oplock create request which creates a handle to the disk and allows the client to cache reads locally on the PVS Server.

During Streaming without NAS server switch we see the following behavior,

  1. PVS target sends a read request to the PVS Server.
  2. The Stream process then performs a read file operation the vDisk on the NAS .
    • The PVS Server (SMB client) sends a SMB read request to the NAS.
    • The NAS responds with a SMB read response.
  3. The PVS Server then send the read response to the target.

image.png

Windows oplocks is a lightweight performance-enhancing feature. It is not a robust and reliable protocol. Every implementation of oplocks should be evaluated as a trade-off between perceived performance and reliability. Reliability decreases as each successive rule above is not enforced. Consider a share with oplocks enabled, over a wide-area network, to a client on a South Pacific atoll, on a high-availability server, serving a mission-critical multiuser corporate database during a tropical storm. This configuration will likely encounter problems with oplocks.

Oplocks can be beneficial to perceived client performance when treated as a configuration toggle for client-side data caching. If the data caching is likely to be interrupted, then oplock usage should be reviewed. Samba enables oplocks by default on all shares. Careful attention should be given to the client usage of shared data on the server, the server network reliability, and the oplocks configuration of each share. In mission-critical, high-availability environments, data integrity is often a priority. Complex and expensive configurations are implemented to ensure that if a client loses connectivity with a file server, a failover replacement will be available immediately to provide continuous data availability.


Windows client failover behavior is more at risk of application interruption than other platforms because it is dependent upon an established TCP transport connection. If the connection is interrupted as in a file server failover a new session must be established. It is rare for Windows client applications to be coded to recover correctly from a transport connection loss; therefore, most applications will experience some sort of interruption at worst, abort and require restarting.


If a client session has been caching writes and reads locally due to oplocks, it is likely that the data will be lost when the application restarts or recovers from the TCP interrupt. When the TCP connection drops, the client state is lost. When the file server recovers, an oplock break is not sent to the client. In this case, the work from the prior session is lost. Observing this scenario with oplocks disabled and with the client writing data to the file server real-time, the failover will provide the data on disk as it existed at the time of the disconnect.

Related:

  • No Related Posts

Leave a Reply