This article describes a generic approach to delivering softphones and voice chat applications with Citrix Virtual Apps and Desktops (CVAD) 7.x.
1. Alternatives for Delivering Softphones
CVAD support several alternatives for delivering softphones.
- Control mode, where the hosted (published) softphone in XenApp or XenDesktop simply controls a physical telephone set. In this mode, no audio traffic goes through the XenApp or XenDesktop server. TCP/IP connectivity between the softphone in the virtual desktop and the phone device must exists, either directly or through a Communications Manager server (e.g. CUCM).
- HDX optimized softphone support (recommended), where the media engine runs on user device, and VoIP (voice over Internet Protocol) traffic flows peer-to-peer. Examples of this include:
- HDX Optimization for Microsoft Teams
- HDX RealTime Optimization Pack which optimizes the delivery of Microsoft Skype for Business
- Cisco Jabber Softphone for VDI (formerly known as VXME)
- Cisco Webex Meetings for VDI
- Avaya VDI Communicator for one-X Communicator and one-X Agent.
- Zoom VDI Plugin
- Genesys PureEngage Cloud
- Nuance Dragon PowerMic dictation device
- Local App Access, a XenApp and XenDesktop feature that allows an application such as a softphone to run locally on the user’s Windows device yet appear seamlessly integrated with their virtual/published desktop. This offloads all audio processing to the user device. This may not be viable if the softphone needs to share information with an app that is hosted on XenApp/XenDesktop (e.g. Outlook), depending on how they communicate.
- HDX generic softphone support (VoIP-over-ICA).
This article focuses on generic softphone support, where an unmodified softphone is hosted on Citrix Virtual Apps and Desktops in the data center and the audio traffic goes over the Citrix ICA protocol (preferably using UDP/RTP) to the user device running the Citrix Workspace app. Generic softphone support is a feature of HDX, which also includes technologies for real-time video (see CTX124516 – How to Optimize HDX MediaStream Server-Rendered Video and Citrix Documentation – Improve video conference performance).
This approach to softphone delivery is especially valuable when:
- An optimized solution for delivering the softphone is not available and the user is not on a Windows device where Local App Access could be used;
- The media engine needed for optimized delivery of the softphone has not been installed on the user device or is not available for the operating system version running on the user device; in this scenario, HDX technologies provides a valuable fallback solution.
2. Generic Softphone Support
There are two aspects to softphone delivery using Citrix Virtual Apps and Desktops:
- How the softphone application is delivered to the virtual/published desktop
- How the audio is delivered to and from the user’s headset, microphone, and speakers, or USB telephone set
XenApp and XenDesktop 7.6 and higher include numerous technologies to support generic softphone delivery:
- Optimized-for-Speech codec for fast encode of real-time audio and bandwidth efficiency
- Low latency audio stack
- Server-side jitter buffer to smooth out the audio when network latency fluctuates
- Packet tagging (DSCP and WMM) for QoS
- DSCP tagging for RTP packets (Layer 3)
- WMM tagging for Wi-Fi
The Workspace app (former Citrix Receiver) versions for Windows, Linux, Mac, Android and Chrome also are VoIP capable.
Citrix Workspace app for Windows offers:
- Client-side jitter buffer – Ensures smooth audio even when network latency fluctuates
- Echo cancellation – Allows for greater variation in the distance between microphone and speakers for workers who do not use a headset
- Audio plug-n-play – Audio devices do not need to be plugged in before starting a session, they can be plugged in at any time
- Audio device routing – Users can direct ringtone to speakers but the voice path to their headset
- Multi-stream ICA – Enables flexible Quality of Service (QoS)-based routing over the network
- ICA supports 4 TCP and 2 UDP streams; one of the UDP streams supports real-time audio over RTP
2.1. Delivering Softphone Applications to the Virtual Desktop
There are three methods by which a softphone can be delivered to the virtual desktop:
- The application can be installed in the virtual desktop image.
- The application be streamed to the virtual desktop using Microsoft App‑V. This approach has manageability advantages because the virtual desktop image is kept uncluttered. Once streamed to the virtual desktop, the application executes in that environment just as if it had been installed in the usual manner. [Note that not all applications are compatible with App-V.]
- In addition, Citrix App Layering can be used.
2.2. Delivering Audio to and from the User Device
HDX RealTime supports three methods of delivering audio to and from the user device:
- the optimized Citrix Audio Virtual Channel
- Generic USB Redirection.
- A third hybrid mode known as Composite USB Redirection is also supported (see 2.2.3)
The Citrix Audio Virtual Channel is generally recommended as it is designed specifically for audio transport, but Generic USB Redirection is useful to support audio devices with buttons and/or a display, that is HID devices, if the user device is on a LAN or LAN-like connection back to the XenApp/XenDesktop VDA server.
2.2.1 Citrix Audio Virtual Channel
The bidirectional Citrix Audio Virtual Channel (CTXCAM) enables audio to be delivered very efficiently over the network.
HDX takes the audio from the user’s headset or microphone, compresses it, and sends it over ICA to the softphone application on the virtual desktop. Likewise, the audio output of the softphone is compressed and sent in the other direction to the user’s headset or speakers.
This compression is independent of the compression used by the softphone itself (such as Opus, G.729 or G.711). It is done using the Optimized-for-Speech codec (“Medium Quality”). Its characteristics are ideal for voice-over-IP (VoIP). It features quick encode time and it consumes only [approximately] 56 Kilobits per second of network bandwidth (28 Kbps in each direction). Note that this codec must be explicitly selected in the Studio console as it is not the default audio codec; the default is the HD Audio codec (“High Quality”) which is excellent for high fidelity stereophonic soundtracks but is slower to encode compared to the Optimized-for-Speech codec.
Bandwidth guidelines for audio playback and recording:
- High quality (default) ~100 kbps [min 75 ; max 175 kbps]
- Medium quality (recommended for VoIP) ~ 28 kbps [min 20 ; max 40 kbps]
- Low quality ~ 12 kbps [min 10 ; max 25 kbps]
Audio over UDP provides excellent tolerance of network congestion and packet loss, and is preferred over TCP when available. XenDesktop 5.5 introduced the Audio over UDP Real-time Transport user policy setting. This capability is also available in XenApp 7.6 or higher.
When this policy is configured in Studio, the Audio stream is effectively pulled out and delivered out-of-band from ICA TCP – Workspace app is communicating directly to the VDA over ICA RTP/UDP.
This is still preferred over EDT, since EDT is a reliable protocol (despite using UDP as the underlying transport).
For this policy to be enforced, Audio Quality policy in Studio must be configured to use Medium Quality.
By default, the Audio virtual channel would start on a regular ICA TCP connection, and then an ICA module starts initializing corresponding ICA RTP sessions. Both VDA and Workspace app would do handshake over UDP before they start sending data over RTP. UDP handshake confirms that data can flow over UDP in each direction. If the UDP handshake fails then the Audio virtual channel would continue to use ICA TCP to send and receive the data.
Windows Workspace app
Please note that for UDP Audio to work, you will need to configure GPOs on the client-side also. See CTX121613.
If you also have unmanaged endpoints (BYOD), you will need to edit the default.ica file in Storefront, as described here.
EnableUDPThroughGateway=true [if you have a gateway, otherwise false]
Linux Workspace app
1. Set the following options in the ClientAudio section of module.ini: Set EnableUDPAudio to True.
By default, this is set to False, which disables UDP audio.
2. Specify the minimum and maximum port numbers for UDP audio traffic using UDPAudioPortLow and UDPAudioPortHigh respectively. By default, ports 16500 to 16509 are used.
Note: Linux Workspace app does not support DTLS encryption for RTP Audio, hence if a Gateway is in place the Audio virtual channel will fall back to TCP.
Resultant Set of Policies
When you set the Audio quality on the Client, the Server (or default.ica) cannot upgrade it.
The Server (or default.ica) can, however, downgrade the resulting quality.
If UDP audio is enabled but the resultant quality is not medium, audio transmission will use TCP not UDP.
UDP Audio and ICA Encryption
When no Citrix Gateway is in place, if Workspace app detects ICA encryption other than “Basic” or “RC5 (128) bit Logon Only” in use, it will not respond to the first handshake message sent by the VDA. No UDP packets
will be sent over the network. The VDA will eventually time out and will not enable RTP audio.
If a Citrix Gateway is in place, only the leg Workspace app-to-front-end virtual server can be encrypted with DTLS. The leg back-end SNIP-to-VDA will be restricted with the same limitation described above.
By default this feature uses UDP port range 16500-16509 and picks up the first available port pair. During VDA installation, it opens up the UDP port range on the VDA side. RTP is used in conjunction with RTCP.
In a Windows 10 VDA, RTP will use an even port (e.g. 16500) and RTCP will use the odd port (16501).
In a Windows Server VDA, Audio is handled by the Generic Virtual Channel service CtxSvcHost.exe [-g AudioSvcs] using a single multiplexed UDP port:
On the client side, it doesn’t explicitly open up any UDP ports during installation of Citrix Workspace app for Windows. During connection setup, Citrix Workspace app uses UDP hole punching to open up the UDP port automatically. Corporate firewalls need to also open up the necessary port range for Audio-over-UDP to work.
Important : When Citrix Gateway is not in the path, audio data transmitted with UDP is not encrypted. If Citrix Gateway is configured to access Citrix Virtual Apps and Desktops resources, then audio traffic between the endpoint device and Citrix Gateway is secured using DTLS protocol. See section 3.5 below.
By default, Windows Firewall blocks inbound UDP traffic. During server/VDA installations, the default UDP port range would be opened up for both inbound and outbound traffic for the specific processes on the VDA. If the administrator wants to specify a different range for server using a machine level policy then that range would need to be explicitly opened up by an administrator.
External corporate firewalls need to be explicitly opened up to handle UDP/RTP traffic.
On most client end point devices, there is normally no need to open up the UDP ports explicitly. During UDP handshake as the client sends the first UDP packet, the firewall creates a rule src ip. Src port, dest ip, dest port and would allow UDP packet coming from dst ip, dst port thus effectively opening up the UDP port dynamically.
Some thin client devices running Windows 7 Embedded have shown to close its UDP “hole punched” firewall port very quickly, on the order of 1-2 minutes. This behavior causes RTP audio to be blocked if audio is not continuously playing, causing UDP packets to be sent which keep the port open. To mitigate this problem it will be necessary to open up the firewall ports on the client device, if such behavior is observed.
To enable UDP Audio, refer to the following links:
2.2.2 Generic USB Redirection
Citrix Generic USB Redirection technology (CTXGUSB virtual channel) provides a generic means of remoting USB devices, including composite devices (audio plus HID) and isochronous USB devices. This approach is generally limited to LAN-connected users because the USB protocol tends to be sensitive to network latency and requires considerable network bandwidth. Isochronous USB redirection has been found to work very well with some softphones, providing excellent voice quality and low latency, but it is generally preferred to use the Citrix Audio Virtual Channel which is optimized for audio traffic.
The primary exception is when using an audio device with HID buttons such as a USB telephone attached to the user device that is LAN-connected to the data center. In this case, Generic USB Redirection offers the advantage of supporting buttons on the phone set or headset that control features by sending a signal back to the softphone (not an issue with buttons that work locally on the device).
Generic USB is not enabled by default and requires configuration.
2.2.3 Composite USB Redirection (hybrid mode)
In Citrix Receiver for Windows 4.7 and earlier, HDX allowed generic redirection of USB devices, where all the interfaces of the device were redirected as a single device. This resulted in sub-optimal Audio performance, so instead the Admin would configure the Audio virtual channel and the sound quality would be fine but they lose the functionality of HID buttons.
Starting with Citrix Receiver for Windows 4.8, we now allow splitting of composite USB device redirection. A composite USB device consists of multiple interfaces, each having its own functionality. Examples of composite USB devices include HID devices that consist of audio and buttons, like a Plantronics headset.
Composite USB redirection is available both in XenDesktop and XenApp sessions. In a desktop session, split devices are displayed in the desktop viewer. In an application session, split devices are displayed in the Connection Center.
You can configure composite USB redirection in using the Group Policy Object administrative template, and the registry.
See here for detailed info.
3. System Configuration Recommendations
3.1 Client Hardware and Software
For optimal audio quality, Citrix recommends the latest version of Citrix Workspace app and a good quality headset with built-in acoustic echo cancellation (AEC). Bi-directional audio VoIP is currently supported by the Citrix Workspace app versions for Windows, Linux, Mac, Android and Chrome. See here for more details.
In addition, Dell Wyse offer VoIP support for ThinOS (WTOS).
3.2 CPU Considerations
Monitor CPU utilization on the VDA to determine if it is necessary to assign (at least) two virtual CPUs to each virtual machine. Real-time voice and video are data intensive and configuring two virtual CPUs reduces the thread switching latency. Therefore, it is generally recommended to configure (at least) two vCPUs in a XenDesktop VDI environment.
Note: Having two virtual CPUs does not necessarily mean doubling the number of physical CPUs, because physical CPUs can be shared across sessions.
It is possible to configure the Citrix Audio Service with CPU priority “high” to help mitigate audio choppiness dramatically. The command line to change the CPU priority can be found here. Because this will be reset after a reboot, it should be run as a start-up script. Please note, do not change the priority to “Realtime”.
Note: the HDX Audio service in Windows Server is CtxSvcHost.exe, but this service is leveraged by multiple virtual channels (Smart Card, Flash, BCR, Teams, NSAP and others). Each virtual channel will have its own PID. More info here.
Citrix Gateway Protocol (CGP), which is used for the Session Reliability feature, also increases CPU consumption. Improvements in XenDesktop 5.5 greatly reduced the CPU impact of Session Reliability / Citrix Gateway Protocol (CGP). Nevertheless, on high quality network connections, this feature could be disabled to further reduce CPU consumption on the VDA.
Neither of the preceding steps might be necessary on a powerful server.
Important: If you are using a Citrix Gateway to encapsulate Audio RTP with DTLS, CGP and Session Reliability must be turned on.
3.3 LAN/WAN Configuration
Proper configuration of the network is critical for good real-time audio quality. VLANs typically need to be configured because excessive broadcast packets can introduce jitter. IPv6-enabled devices may generate a lot of broadcast packets; IPv6 can be disabled on those devices if IPv6 support is not needed. Routers need to be configured to support QoS.
3.4 Settings for use WAN Connections
Voice chat can be used over both LAN and WAN connections. On a WAN connection, audio quality depends on the latency, packet loss, and jitter on the connection. If delivering softphones to users on a Wide Area Network (WAN) connection, Citrix recommends using the Citrix SD-WAN between the data center and the remote office to maintain a high Quality-of-Service (QoS). SD-WAN supports Multi-Stream ICA, including UDP. Also, in the case of a single TCP stream, it is possible to distinguish the priorities of various ICA virtual channels to ensure that high priority real-time Audio data gets preferential treatment. Citrix SD-WAN can notoriously improve MOS scores.
Use Director or the HDX Monitor to validate your HDX configuration.
3.5 Remote User Connections
NetScaler Gateway 11 supports DTLS to deliver UDP/RTP traffic natively (without encapsulation in ICA TCP). Traffic is encrypted using Basic ICA encryption (on by default) between the VDA and the Citrix Gateway backend virtual server, and is encrypted with DTLS between the frontend virtual server and Workspace app for Windows (only platform supported). CGP/Session Reliability must be enabled on the VDA and in Storefront.
Firewalls must be opened bidirectionally for UDP traffic over Port 443 from Workspace app to the Gateway’s front-end virtual server.
The Gateway back-end SNIP will communicate with the VDA using RTP/UDP ports 16500-16509.
More info can be found here:
- Citrix Blog – Audio through a NetScaler Gateway
- Citrix Documentation – Support for DTLS in NetScaler
3.6 Codec Selection and Bandwidth Consumption
Between the user device and the Virtual Delivery Agent (VDA) in the data center, Citrix recommends using the Optimized-for-Speech codec setting, also known as Medium Quality audio.
Between the VDA platform and the IP-PBX, the softphone uses whatever codec is configured or negotiated. For example:
- G711 provides very good voice quality but has a bandwidth requirement of 80 to 100 kilobits per second per call (depending on Network Layer2 overheads).
- G729 provides good voice quality and has a low bandwidth requirement of 30 to 40 kilobits per second per call (depending on Network Layer 2 overheads).
- CTX130912 – Testing and Using Audio and Video with Microsoft Lync 2013 and XenDesktop using Optimization Pack for Lync 1.7
- CTX201116 – Remote Access with Citrix HDX RealTime Optimization Pack
- CTX132979 – Technical Support of Microsoft Skype/Lync on XenApp/XenDesktop
- CTX123015 – How to Configure Automatic Redirection of USB Devices
- CTX200291 – How to Apply DSCP Marking for the CloudBridge Appliance with QOS Enabled
- CTX124634 – XenDesktop Support for Avaya Softphones
- CTX138408 – XenDesktop, XenApp and Citrix Receiver Support for Microsoft’s VDI Plug-in for Skype for Business and Lync
- Deploying Cisco Jabber in a Virtual Environment – BRKCOL-2170
- Latest compatibility information – http://community.citrix.com/citrixready