How to Use host-cpu-tune to Fine tune XenServer 6.2.0 Performance

Pinning Strategies

  • No Pinning (default): When no pinning is in effect, the Xen hypervisor is free to schedule domain’s vCPUs on any pCPUs.
    • Pros: Greater flexibility and better overall utilization of available pCPUs.
    • Cons: Possible longer memory access times, particularly on NUMA-based hosts. Possible lower I/O throughput and control plane operations when pCPUs are overcommitted.
    • Explanation: When vCPUs are free to run on any pCPU, they may allocate memory in various regions of the host’s memory address space. At a later stage, a vCPU may run on a different NUMA node and require access to that previously allocated data. This makes poor utilization of pCPU caches and incur in higher access times to that data. Another aspect is the impact on I/O throughput and control plane operations. When more vCPUs are being executed than pCPUs that are available, the Xen hypervisor might not be able to schedule dom0’s vCPUs when they require execution time. This has a negative effect on all operations that depend on dom0, including I/O throughput and control plane operations.
  • Exclusive Pinning: When exclusive pinning is on effect, the Xen hypervisor pins dom0 vCPUs to pCPUs in a one-to-one mapping. That is, dom0 vCPU 0 runs on pCPU 0, dom0 vCPU 1 runs on pCPU 1 and so on. Any VM running on that host is pinned to the remaining set of pCPUs.
    • Pros: Possible shorter memory access times, particularly on NUMA-based hosts. Possible higher throughput and control plane operations when pCPUs are overcommitted.
    • Cons: Lower flexibility and possible poor utilization of available pCPUs.
    • Explanation: If exclusive pinning is on and VMs are running CPU-intensive applications, they might under-perform by not being able to run on pCPUs allocated to dom0 (even when dom0 is not actively using them).

Note: The exclusive pinning functionality provided by host-cpu-tune will honor specific VM vCPU affinity configured using the VM parameter vCPU-params:mask. For more information, refer to the VM Parameters section in the appendix of the XenServer 6.2.0 Administrator’s Guide.

Using host-cpu-tune

The tool can be found in /usr/lib/xen/bin/host-cpu-tune. When executed with no parameters, it displays help:

[root@host ~]# /usr/lib/xen/bin/host-cpu-tune

Usage: /usr/lib/xen/bin/host-cpu-tune { show | advise | set <dom0_vcpus> <pinning> [–force] }

show Shows current running configuration

advise Advise on a configuration for current host

set Set host’s configuration for next reboot

<dom0_vcpus> specifies how many vCPUs to give dom0

<pinning> specifies the host’s pinning strategy

allowed values are ‘nopin’ or ‘xpin’

[–force] forces xpin even if VMs conflict

Examples: /usr/lib/xen/bin/host-cpu-tune show

/usr/lib/xen/bin/host-cpu-tune advise

/usr/lib/xen/bin/host-cpu-tune set 4 nopin

/usr/lib/xen/bin/host-cpu-tune set 8 xpin

/usr/lib/xen/bin/host-cpu-tune set 8 xpin –force

[root@host ~]#

Recommendations

The total number of pCPUs and advise as follows:

# num of pCPUs < 4 ===> same num of vCPUs for dom0 and no pinning

# < 24 ===> 4 vCPUs for dom0 and no pinning

# < 32 ===> 6 vCPUs for dom0 and no pinning

# < 48 ===> 8 vCPUs for dom0 and no pinning

# >= 48 ===> 8 vCPUs for dom0 and excl pinning

The utility works in three distinct modes:

  1. Show: This mode displays the current dom0 vCPU count and infer the current pinning strategy.

    Note: This functionality will only examine the current state of the host. If configurations are changed (for example, with the set command) and the host has not yet been rebooted, the output may be inaccurate.

  2. Advise: This recommends a dom0 vCPU count and a pinning strategy for this host.

    Note: This functionality takes into account the number of pCPUs available in the host and makes a recommendation based on heuristics determined by Citrix. System administrators are encouraged to experiment with different settings and find the one that best suits their workloads.

  3. Set: This functionality changes the host configuration to the specified number of dom0 vCPUs and pinning strategy.

    Note: This functionality may change parameters in the host boot configuration files. It is highly recommended to reboot the host as soon as possible after using this command.

    Warning: Setting zero vCPUs to dom0 (with set 0 nopin) will cause the host not to boot.

Resetting to Default

The host-cpu-tune tool uses the same heuristics as the XenServer Installer to determine the number of dom0 vCPUs. The installer, however, never activates exclusive pinning because of race conditions with Rolling Pool Upgrades (RPUs). During RPU, VMs with manual pinning settings can fail to start if exclusive pinning is activated on a newly upgraded host.

To reset the dom0 vCPU pinning strategy to default:

  1. Run the following command to find out the number of recommended dom0 vCPUs:

    [root@host ~]# /usr/lib/xen/bin/host-cpu-tune advise

  2. Configure the host accordingly, without any pinning:
    • [root@host ~]# /usr/lib/xen/bin/host-cpu-tune set <count> nopin
    • Where <count> is the recommended number of dom0 vCPUs indicated by the advise command.
  3. Reboot the host. The host will now have the same settings as it did when XenServer 6.2.0 was installed.

Usage in XenServer Pools

Settings configured with this tool only affect a single host. If the intent is to configure an entire pool, this tool must be used on each host separately.

When one or more hosts in the pool are configured with exclusive pinning, migrating VMs between hosts may change the VM's pinning characteristics. For example, if a VM are manually pinned with the vCPU-params:mask parameter, migrating it to a host configured with exclusive pinning may fail. This could happen if one or more of that VM's vCPUs are pinned to a pCPU index exclusively allocated to dom0 on the destination host.

Additional commands to obtain information concerning CPU topology:

xenpm get-cpu-topology

xl vcpu-list

Related:

  • No Related Posts

7022546: Updating microcode in Xen environments.

SLES12SP2 and newer Xen environments:

Beginning withSLES12SP2, Dom0 is now a PVOPS based kernel (kernel-default), whichhas no interface for microcode updates while running as a Dom0.However, if the initrd contains an updated microcode, and Xen is madeaware of its existence, the update will be applied during the Xenearly boot process. Updates using this method required a host rebootafter correctly adding the microcode to the initrd.

Installing a microcode update in SLES12SP2 and newerenvironments:

1. Determine current microcode level:

# grep -m1 microcode/proc/cpuinfo

microcode : 0x2000011

2. Install updated microcode package (ucode-intel, or ucode-amd).

3. Rebuild initrd using `mkinitrd`.

NOTE – The `lsinitrd` command can beused to verify the microcode is correctly inserted into the initrd.

#lsinitrd /boot/initrd-4.12.14-23-default

Image:/boot/initrd-4.12.14-23-default:11M

================================================================

EarlyCPIOimage

================================================================

drwxr-xr-x 1 root root 0 Jul 13 13:05 .

-rw-r–r– 1root root 2 Jul 13 13:05 early_cpio

drwxr-xr-x 1 root root 0 Jul 13 13:05 kernel

drwxr-xr-x 1root root 0 Jul 13 13:05 kernel/x86

drwxr-xr-x 1 root root 0 Jul 13 13:05kernel/x86/microcode

-rw-r–r– 1 root root 31744Jul 13 13:05kernel/x86/microcode/GenuineIntel.bin

================================================================

4. Edit /etc/default/grub, and add “ucode=scan” to Xenhypervisor command line:

GRUB_CMDLINE_XEN_DEFAULT=”vga=gfx-1024x768x16crashkernel=202M<4G ucode=scan”

5. Reboot.

6. Verify microcode is updated:

# grep -m1 microcode/proc/cpuinfo

microcode : 0x200004a

7. Verify new speculative mitigation features are availablethrough `xl dmesg`.

# xl dmesg | grep Speculative-A5

(XEN) Speculative mitigation facilities:

(XEN) Hardware features: IBRS/IBPB STIBP SSBD

(XEN) Compiled-insupport: INDIRECT_THUNK

(XEN) Xen settings: BTI-Thunk JMP,SPEC_CTRL: IBRS+ SSBD-, Other: IBPB

(XEN) Support for VMs: PV:MSR_SPEC_CTRL RSB, HVM: MSR_SPEC_CTRL RSB

(XEN) XPTI (64-bitPV only): Dom0 enabled, DomU enabled

Pre-SLES12SP1 Xen environments:

In SLES12SP1 and older(including SLES11), the Dom0 kernel (kernel-xen) is based onxenlinux. This environment can upgrade microcode from Dom0 atrun-time. However, the CPU is not re-sampled after such an update,and therefore guests cannot use new features exposed with an onlinemicrocode update. To avoid this problem, micocode updates should bedone using the following steps:

Installing a microcode update in SLES12SP1 and olderenvironments:

1. Install updated microcode package (microcode_ctrl).

2. Determine correct microcode file:

# grep -E ‘family|model|stepping’ -m 3/proc/cpuinfo

cpu family : 6

model : 62

model name :Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz

stepping : 7

Intel microcode is named “[cpufamily]-[model]-[stepping]”, using hexadecimal values. In the aboveoutput, this would be “06-3e-07”.

AMD microcode is named”microcode_amd_fam[NN]h.bin”, where [NN] is the hexadecimalvalue of the CPU family. For example:

# grep -E ‘cpu family|model name’ -m 2/proc/cpuinfo

cpu family : 23

model name :AMD EPYC 7601 32-Core Processor

For the AMD CPU above, the applicablemicrocode would be /lib/firmware/amd-ucode/microcode_amd_fam17h.bin.

3. Copy the microcode file from /lib/firmware/intel-ucode to/boot as GenuineIntel.bin. (For AMD environments, use/lib/firmware/amd-ucode and AuthenticAMD.bin.)

# cp/lib/firmware/intel-ucode/06-3e-07 /boot/GenuineIntel.bin

NOTE – For EFI boot environments,the microcode should be copied to the EFI boot partition anddirectory used in booting. This is typically /boot/efi/efi/SuSE.

4. Edit /etc/default/grub, and make the following 2 changes:

– Add thefollowing module line in the Xen boot section, following the initrdmodule:

module /boot/GenuineIntel.bin

– Add “ucode=2” (where “2” is the “module” line number containing the GenuineIntel.bin string, starting from 0) to Xen hypervisor command line:

“kernel/boot/xen.gz vga=mode-0x317 ucode=2”

NOTEfor EFI boot environments, add the following line to the Xen efi bootconfiguration (/boot/efi/efi/SuSE/xen.cfg)entries.

“ucode=GenuineIntel.bin”

5. Reboot.

6. Verify new speculative mitigation features are availablethrough `xm dmesg`.

# xm dmesg | grep Speculative-A5

(XEN) Speculative mitigation facilities:

(XEN) Hardware features: IBRS/IBPB STIBP SSBD

(XEN) Xen settings:BTI-Thunk N/A, SPEC_CTRL: IBRS+ SSBD-, Other: IBPB

(XEN) Support for VMs: PV: MSR_SPEC_CTRL RSB, HVM: MSR_SPEC_CTRL RSB

(XEN) XPTI (64-bit PV only): Dom0 enabled, DomU enabled

NOTE: Multiple vendors may provide updated microcode. Ultimately,only the updates which matches the running CPU (using hex cupidcomparison) will be applied during the update process.

Related:

  • No Related Posts

How to manage SMT (hyper-threading) in XenServer?

This article describes how to explicitly manage simultaneous multithreading (SMT) or hyper-threading on supported hardware.

SMT or hyper-threading is a technology that is designed to improve CPU performance by enabling parallelization of computations. When hyper-threading is enabled, each physical processor is split into multiple logical processors.

If you are running a XenServer host with hyper-threading enabled, disabling hyper-threading will reduce the number of CPUs available on the host. Before you disable hyper-threading, you must carefully evaluate the workload and decide whether disabling is the best option for your XenServer deployment.

Citrix recommends that you do not run a VM with more virtual CPUs (vCPUs) than the number physical CPUs available on the XenServer host. For more information, see CTX236977 – Overcommitting pCPUs on individual XenServer VMs.

Related:

  • No Related Posts

7023078: Security Vulnerability: “L1 Terminal Fault” (L1TF) ??? Hypervisor Information (CVE-2018-3620, CVE-2018-3646, XSA-273).

Full mitigation for this issue requires a combination of hardware and software changes. Depending on the guest type, software changes may be required at both the Hypervisor and guest level.

Updated Intel microcode (provided through your hardware / BIOS vendor or by SUSE) introduces a new feature called “flush_l1d”. Hypervisors and bare-metal kernels use this feature to flush the L1 data cache during operations which may be susceptible to data leakage (e.g. when switching between VMs in Hypervisor environments).

Software mitigations exist for the Linux Kernel and for Hypervisors. These mitigations include support for new CPU features, passing these features to guests, and support for enabling/disabling/tuning the mitigations. Recommended mitigations vary depending on the environment.

For the Linux kernel (on both bare metal and virtual machines) L1TF mitigation is controlled through the “l1tf” kernel boot parameter. For complete information on this parameter, see TID 7023077.

KVM

For KVM host environments, mitigation can be achieved through L1D cache flushes, and/or disabling Extended Page Tables (EPT) and Simultaneous MultiThreading (SMT).

The L1D cache flush behavior is controlled through the “kvm-intel.vmentry_l1d_flush” kernel command line option:

kvm-intel.vmentry_l1d_flush=always

The L1D cache is flushed on every VMENTER.

kvm-intel.vmentry_l1d_flush=cond

The L1D cache is flushed on VMENTER only when there can be leak of host memory between VMEXIT and VMENTER. This could still leak some host data, like address space layout.

kvm-intel.vmentry_l1d_flush=never

Disables the L1D cache flush mitigation.

The default setting here is “cond”.

The l1tf “full” setting overrides the settings of this configuration variable.


L1TF can be used to bypass Extended Page Tables (EPT). To mitigate this risk, it is possible to disable EPT and use shadow pages instead. This mitigation is available through the “kvm-intel.enable_ept” option:
kvm-intel.enable_ept=0

The Extended Page tables support is switched off.
As shadow pages are much less performant than EPT, SUSE recommends leaving EPT enabled, and use L1D cache flush and SMT tuning for full mitigation.


To eliminate the risk of untrusted processes or guests exploiting this vulnerability on a sibling hyper-thread, Simultaneous MultiThreading (SMT) can be disabled completely.

SMT can be controlled through kernel boot command line parameters, or on-the-fly through sysfs:

On the kernel boot command line:

nosmt

SMT is disabled, but can be later reenabled in the system.

nosmt=force

SMT is disabled, and can not be reenabled in the system.

If this option is not passed, SMT is enabled. Any SMT options used with the “l1tf” kernel parameter option overrides this “nosmt” option.


SMT can also be controlled through sysfs:

/sys/devices/system/cpu/smt/control

This file allows to read the current control state and allows to disable or (re)enable SMT.

Possible states are:

on

SMT is supported and enabled.

off

SMT is supported, but disabled. Only primary SMT threads can be onlined.

forceoff

SMT is supported, but disabled. Further control is not possible.

notsupported

SMT is not supported.

Potential values that can be written into this file:

on

off

forceoff

/sys/devices/system/cpu/smt/active

This file contains the state of SMT, if it is enabled and active, where active means that multiple threads run on 1 core.

Xen

For Xen hypervisor environments, mitigation is enabled by default and varies based on guest type. Manual adjustment of the “smt=” parameter is recommended, but the remaining parameters are best left at default values.A description of all relevant parameters are provided in the event any changes are necessary.

PV guests achieve mitigation at the Xen Hypervisor level. If a PV guest attempts to write an L1TF-vulnerable PTE, the hypervisor will force shadow mode and prevent the vulnerability. PV guests which fail to switch to shadow mode (e.g. due to a memory shortage at the hypervisor level) are intentionally crashed.

pv-l1tf=[ <bool>, dom0=<bool>, domu=<bool> ]

By default, pv-l1tf is enabled for DomU environments and, for stability and performance reasons, disabled for Dom0.

HVM guests achieve mitigation through a combination of L1D flushes, and disabling SMT.

spec-ctrl=l1d-flush=<bool>

This parameter determines whether or not the Xen hypervisor performs L1D flushes on VMEntry. Regardless of this setting, this feature is virtualized and passed to HVM guests for in-guest mitigation.

smt=<bool>
This parameter can be used to enable/disable SMT from the hypervisor. Xen environments hosting any untrusted HVM guests, or guests not under the full control of the host admin, should either disable SMT (through BIOS or smt=<bool> means), or ensure HVM guests use shadow mode (hap=0) in order to fully mitigate L1TF. It is also possible to reduce the risk of L1TF through the use of CPU pinning, custom CPU pools and/or soft-offlining of some hyper-threads.
These approaches are beyond the scope of this TID, but are documented in the standard Xen documentation.

WARNING – The combination of Meltdown mitigation (KPTI) and shadow mode on hardware which supports PCID can result in a severe performance degradation.

NOTE – Efforts are ongoing to implement scheduling improvements that allow hyper-thread siblings to be restricted to threads from a single guest. This will reduce the exposure of L1TF, and the requirement to disable SMT in many environments.

Related:

  • No Related Posts

7023077: Security Vulnerability: “L1 Terminal Fault” (L1TF) aka CVE-2018-3615, CVE-2018-3620 & CVE-2018-3646.

Modern Intel CPUs feature “hyper threads”, where multiple threads of execution can happen on the same core, sharing various resources, including the Level 1 (L1) Data Cache.

Researchers have found that during speculative execution, pagetable address lookups do not honor pagetable present and other reserved bits, so that speculative execution could read memory content of other processes or other VMs if this memory content is present in the shared L1 Datacache of the same core.

The issue is called “Level 1 Terminal Fault”, or short “L1TF”.

At this time this issue is known to only affect Intel CPU’s.
Not affected Intel CPU’s are :
– Older models, where the CPU family is < 6

– A range of ATOM processors

(Cedarview, Cloverview, Lincroft, Penwell, Pineview, Slivermont, Airmont, Merrifield)

– The Core Duo Yonah variants (2006 – 2008)

– The XEON PHI family

– Processors which have the ARCH_CAP_RDCL_NO bit set in the IA32_ARCH_CAPABILITIES MSR.

If the bit is set the CPU is also not affected by the Meltdown vulnerabitly.

(Note: These CPUs should become available end of 2018)
CPUs from ARM and AMD are not affected by this problem.

For other CPU vendors affectedness is currently unknown.

3 Variants of the issue are tracked :

– OS level: CVE-2018-3620

– VMM level: CVE-2018-3646

– SGX enclave level: CVE-2018-3615

SUSE’s mitigations cover the OS and VMM levels.

Attackers could use this issue get access to other memory in physical RAM on the machine.
Untrusted malicious VMs are able to read memory from other VMs, the Host system or SGX enclaves.

Note that this requires the memory being loaded into L1 datacache from another process / VM, which is hard to control for an active attacker.

Related:

  • No Related Posts

Re: Intensive use of OWA crashes IIS on the Exchange 2016 servers

Greetings,

So far I am at least not aware of SourceOne OWA app causing high resource usage on Exchange servers but then again most of the sites I come across are those where Outlook Client is main client instead of OWA.

If everyone is connecting to OWA then that would add extra load on the servers.

Few things comes to my mind:

1. Have you checked under Worker Processes (within IIS manager), if you are finding any specific application pool is consuming more resources? You mentioned OWA application pool

2. Confirm if debugging for SourceOne OWA extensions is not enabled since that comes with some performance cost.

3. In worst case scenario to rule out OWA extension, you can uninstall them and monitor for some time. If issue does not come back then install again and see if it comes back.This will however cause problem with shortcut resolution.

4. There has been some issues reported in past with Exchange where if CPUs on exchange servers are more than recommended count then it causes problems (Ref: https://blogs.technet.microsoft.com/jcoiffin/2016/11/30/100-cpu-on-exchange-20132016-check-the-number-of-cpus-is-not-too-high/ ) . For something similar in Exchange 2013 there as a .net fix that was created.

I believe I just now found you already opened service request with support team. We will monitor to see how things go there.

Best regards,

Rajan Katwal

(PS: Somehow I am unable to find this discussion under the SourceOne product page, in future if possible please port under SourceOne forums

SourceOne and EmailXtender

)

Related:

  • No Related Posts

Re: Storage policy change for VMs

Hi,

Yes, you can track this process on the “Resyncing Components” view.

The resync uses CPU cycles, so you can expect an increase during this operation, but you can throttle this traffic in order to minimize the impact on the production workloads.

This is an area that was greatly improved on the latest vSAN versions.

Thanks,

Related:

  • No Related Posts

VNX M and R, ViPR SRM, Watch4Net VNX SolutionPack: Storage Processor Write Cache report shows no data

Article Number: 479818 Article Version: 4 Article Type: Break Fix



VNX Family Monitoring & Reporting,Watch4net,ViPR SRM

The Storage Processor Write Cache Utilization report shows no data for some arrays.

This will happen on arrays running Flare version 33 and later due to Storage Processor cache architecture changing in FLARE 33, which causes several of the cache related metrics to not be available like they are for versions 30-32.

In particular, the following metrics used to calculate this report are not available:

  • Flush Ratio (%)
  • High Water Flush On
  • Idle Flush On
  • Low Water Flush Off
  • Write Cache Flushes/s

The product is functioning as designed.

Related:

  • No Related Posts

NetScaler CPU Profiling

1. Profiler Scripts

nsproflog.sh – This script is used to start/stop NetScaler Profiler.

2. Profiler directory path

/var/nsproflog – All profiler related captured data/or scripts resides in this directory.

3. Constant Profiling

On NetScaler, at the boot time, the profiler is invoked and it keeps running. At any time if any of the packet engine’s (PE) associated CPU exceeds 90%utilization, the profiler captures the data into a set of files in newproflog_cpu_<cpu-id>.out.

4. Help Usage

root@ns# /netscaler/nsproflog.sh -hnCore Profilingnsproflog - utility to start/stop NetScaler profiler to capture data and to display the profiled datausage: nsproflog.sh [-h] [cpu=<cpu-id>] [cpuuse=<cpu_utilization_in_percentage*10> | lctnetio=<time_in_microseconds> | lctidle=<time_in_microseconds> | lctbsd=<time_in_microseconds> | lcttimer=<time_in_microseconds> | lcttimerexec=<time_in_microseconds> | lctoutnetio=<time_in_microseconds> | time=<time_in_seconds>] [loop=<count>] [hitperc=<value_in_percentage>] [display=<capture_file_path>] [kernel=<nsppe_file_path>] [start | stop] -h - print this message - exclusive option

Options used for starting the profiler:

  • start – start the capture
  • cpu – cpu-id on which profiler needs to capture data, default: on all cpus
  • cpuuse – threshold value (in cpu_percentage*10) when cpu utilization exceeds above this will trigger profiler to start capturing data in newproflog_cpu_<cpu-id>.out
  • lct* – help to find Lost CPU Time (in microseconds), when CPU cycles are spent for longer duration in functions other than packet processing
  • time – time (in seconds) to capture the profiler data before restarting a new capture
  • loop – number of iterations of the profiler captures
  • LCT have following options:

  • lctidle – Amount of time spent in idle function
  • lctnetio – Amount of time spent in netio
  • lcttimer – Amount of timer HA timer is not called
  • lcttimerexec – Amount of time spent in executing NetScaler timeout functions e.g pe_dotimeout etc
  • lctbsd – Amount of time spent in freebsd
  • lctoutnetio – Amount of time spent since netio is called again


Options used for displaying the profiler data:

  • hitperc – hit percentage threshold for displaying functions with Hitratio (Number of hits for the function in percentage) above the threshold, default: 1%
  • display – display profiled data captured for specific cpu-id from capture file e.g newproflog_cpu_<cpu-id>.out
  • kernelhits – display hits symbols for kernel profile data captured for specific cpu-id from capture file e.g newproflog_cpu_<cpu-id>.out
  • ppehits – display hits symbols for PPE profile data captured for specific cpu-id from capture file e.g newproflog_cpu_<cpu-id>.out
  • aggrhits – display aggregated hits symbol for combined kernel and PPE data captured for specific cpu-id from capture file e.g newproflog_cpu_<cpu-id>.out
  • kernel – NetScaler nsppe file path, default: /netscaler/nsppe


Options used for stopping the profiler:

  • cpu – cpu-id on which profiler needs to be stopped, default: on all cpus
  • ​stop – stop the capture and generate a .tar.gz file for the captured outputs

Examples:

To start the profiler with a threshold above 70% CPU utilization to capture data on all the CPUs :

nsproflog.sh cpuuse=700 start

To start the profiler with capture when lost cpu time exceeds 2 milliseconds inside idle functions:

nsproflog.sh lctidle=2000 start

To stop the profiler and generate the .tar.gz of all captured data:

nsproflog.sh stop

To display captured data for all function with Hitratio > 1% :

nsproflog.sh display=/var/nsproflog/newproflog.0/newproflog_cpu_1.out

To display captured data for all function with Hitratio > 0% :

nsproflog.sh hitperc=0 display=/var/nsproflog/newproflog.0/newproflog_cpu_1.out kernel=/netscaler/nsppe

Note: If another instance of profiler is already running, then please stop the current profiler, before running the new instance of profiler with a different CPU threshold.


5. To start the profiler with no CPU threshold to capture data on all the CPU’s

root@ns# /netscaler/nsproflog.sh start & [1] 65065 root@ns# nCore Profiling Setting (512 KB) of profile buffer for cpu 3 ... Done.Setting (512 KB) of profile buffer for cpu 2 ... Done.Setting (512 KB) of profile buffer for cpu 1 ... Done.Collecting profile data for cpu 3Collecting profile data for cpu 2Capturing profile data for 10 seconds...Collecting profile data for cpu 1Capturing profile data for 10 seconds...Please wait for profiler to capture dataCapturing profile data for 10 seconds...root@ns# root@ns# root@ns# Saved profiler capture data in newproflog.9.tar.gzCollecting profile data for cpu 3Collecting profile data for cpu 2Capturing profile data for 10 seconds...Collecting profile data for cpu 1Capturing profile data for 10 seconds...Please wait for profiler to capture dataCapturing profile data for 10 seconds...root@ns#cd /var/nsproflog root@ns#pwd /var/nsproflog root@ns# ls -lltotal 9356 -rw-r--r-- 1 root wheel 109423 Sep 24 22:37 newproflog.0.tar.gz-rw-r--r-- 1 root wheel 156529 Sep 24 22:38 newproflog.1.tar.gz-rw-r--r-- 1 root wheel 64410 Sep 24 22:38 newproflog.2.tar.gz-rw-r--r-- 1 root wheel 111448 Sep 24 22:38 newproflog.3.tar.gz-rw-r--r-- 1 root wheel 157538 Sep 24 22:38 newproflog.4.tar.gz-rw-r--r-- 1 root wheel 65603 Sep 24 22:38 newproflog.5.tar.gz-rw-r--r-- 1 root wheel 112944 Sep 24 22:38 newproflog.6.tar.gz-rw-r--r-- 1 root wheel 158081 Sep 24 22:39 newproflog.7.tar.gz-rw-r--r-- 1 root wheel 44169 Sep 24 22:39 newproflog.8.tar.gz-rw-r--r-- 1 root wheel 48806 Sep 25 22:19 newproflog.9.tar.gz-rw-r--r-- 1 root wheel 339 Sep 16 23:16 newproflog.old.tar.gz-rw-r--r-- 1 root wheel 208896 Sep 25 22:19 newproflog_cpu_1.out-rw-r--r-- 1 root wheel 208896 Sep 25 22:19 newproflog_cpu_2.out-rw-r--r-- 1 root wheel 208896 Sep 25 22:19 newproflog_cpu_3.out-rw-r--r-- 1 root wheel 6559889 Sep 18 21:43 newproflog_mgmtcpu-rw-r--r-- 1 root wheel 202630 Sep 18 05:58 newproflog_mgmtcpu.0.gz-rw-r--r-- 1 root wheel 3 Sep 25 22:19 nsproflog.nextfile-rw-r--r-- 1 root wheel 309 Sep 25 22:19 nsproflog_args-rw-r--r-- 1 root wheel 1 Sep 25 22:18 nsproflog_options-rw-r--r-- 1 root wheel 6 Sep 25 22:18 ppe_cores.txt


6. To stop the profiler on all the CPUs

root@ns# /netscaler/nsproflog.sh stop nCore Profiling Stopping all profiler processesKilledKilledKilledRemoving buffer for -s cpu=3Removing profile buffer on cpu 3 ... Done.Removing buffer for -s cpu=2Removing profile buffer on cpu 2 ... Done.Removing buffer for -s cpu=1Removing profile buffer on cpu 1 ... Done.Saved profiler capture data in newproflog.0.tar.gzStopping mgmt profiler process[1]+ Killed: 9 /netscaler/nsproflog.sh startroot@ns# 


7. To display the profiled data on cpu#1 with no CPU threshold for all the function whose Hitratio% is greater than default threshold hitperc=1

root@ns# tar -xzvf newproflog.9.tar.gz newproflog.9/newproflog.9/newproflog_cpu_1.outnewproflog.9/newproflog_cpu_2.outnewproflog.9/newproflog_cpu_3.outnewproflog.9/nsproflog_argsroot@ns# cd newproflog.9root@ns# lsnewproflog_cpu_1.out newproflog_cpu_2.out newproflog_cpu_3.out nsproflog_argsroot@ns#root@ns# /netscaler/nsproflog.sh display=newproflog_cpu_1.out nCore ProfilingDisplaying the profiler command-line arguments used during start of capture/netscaler/nsproflog.sh/netscaler/nsprofmon -s cpu=3 -O -k /var/nsproflog/newproflog_cpu_3.out -T 10 -ye capture/netscaler/nsprofmon -s cpu=2 -O -k /var/nsproflog/newproflog_cpu_2.out -T 10 -ye capture/netscaler/nsprofmon -s cpu=1 -O -k /var/nsproflog/newproflog_cpu_1.out -T 10 -ye captureDisplaying the profile capture statistics for proc with HitRatio > 1% NetScaler NS11.0: Build 13.6.nc, Date: Sep 18 2013, 13:54:47 ==============================================================================Index HitRatio Hits TotalHit% Length Symbol name==============================================================================1 50.358% 5550 50.358% 1904 packet_engine**2 15.380% 1695 65.738% 32 pe_idle_readmicrosec**3 9.037% 996 74.775% 112 nsmcmx_is_pending_messages**4 8.956% 987 83.731% 80 vc_idle_poll**5 7.041% 776 90.772% 96 vmpe_intf_loop_rx_any**6 6.143% 677 96.915% 256 vmpe_intf_e1k_sw_rss_tx_any**7 1.370% 151 98.285% 64 vmpe_intf_e1k_rx_any**==============================================================================8 98.285% 11021 ==============================================================================** - Idle Symbols Displaying the summary of proc hits.................................==============================================================================PID PROCNAME PROCHIT PROCHIT%==============================================================================1326 NSPPE-00 11021 100.00==============================================================================8. Once All data is collected we need to revert back the profiler settings, i.e leave it running at 90% threshold. If this step is skipped, continuous profiling wont happen till next reboot.nohup /usr/bin/bash /netscaler/nsproflog.sh cpuuse=900 start &Verify the following, note the the last line will repeat for each packet cpu.root@ns# ps -aux | grep -i profroot 2946 0.0 0.0 1532 984 ?? Ss 12:27PM 0:00.00 /netscaler/nsprofmgmt 90.0root 2920 0.0 0.1 5132 2464 0 S 12:27PM 0:00.02 /usr/bin/bash /netscaler/nsproflog.sh cpuuse=900 startroot 2957 0.0 0.1 46564 4376 0 R 12:27PM 0:00.01 /netscaler/nsprofmon -s cpu=1 -ys cpuuse=900 -ys profmode=cpuuse -O -k /var/nsproflog/newproflog_cpu_1.out -s logsize=10485760 -ye capture

Related:

  • No Related Posts

Understanding Workspace Environment Management (WEM) System Optimization

The WEM System Optimization feature is a group of settings designed to dramatically lower resource usage on a VDA on which the WEM Agent is installed.

These are machine-based settings that will apply to all user sessions.




Managing Servers with different Hardware Configurations

Sets of VMs may have been configured with different hardware configurations. For instance some machine may have 4 CPU cores and 8GB RAM, while others have 2 CPU Cores and 4GB RAM. The determination could be made such that each server set requires a different set of WEM System Optimization settings. Because machines can only be part of one WEM ConfigSet, administrators must consider whether they need to create multiple ConfigSets to accommodate different optimization profiles.


WEM System Optimization Settings

User-added image


Fast Logoff

A purely visual option that will end the HDX connection to a remote session, giving the impression that the session has immediately closed. However, the session itself continues to progress through the session logoff phases on the VDA.


CPU Management

CPU Priority:

You can statically define the priority for a process. Every instance of, for example, Notepad that is launched on the VDA will be launched with a priority of the desired CPU priority. The choices are:

  • Idle
  • Below Normal
  • Normal
  • Above Normal
  • High
  • Realtime *

* https://stackoverflow.com/questions/1663993/what-is-the-realtime-process-priority-setting-for

CPU Affinity:

You can statically define how many CPU cores a process will use. Every instance of Notepad that is launched on the VDA will use the number of cores defined.

Process Clamping:

Process clamping allows you to prevent a process from using more CPU percentage than the specified value. A process in the Process Clamping list can use CPU up to the configured percentage, but will not go higher. The setting limits the CPU percentage no matter which CPU cores the process uses.

Note: The clamping percentage is global, not per core (that is, 10% on a quad-core CPU is 10%, not 10% of one core).


Generally, Process Clamping is not a recommended solution for keeping the CPU usage of a troublesome process artificially low. It’s a brute force approach and computationally expensive. The better solution is to use a combination of CPU spikes protection and to assign static Limit CPU / Core Usage, CPU priorities, CPU affinities values to such processes.

CPU Management Settings:

CPU Spikes Protection:

CPU Spikes Protection is not the same as Process Clamping. Process Clamping will prevent a process from exceeding a set CPU percentage usage value. Spikes Protection manages the process when it exceeds the CPU Usage Limit (%) value.

CPU Spikes Protection is not designed to reduce overall CPU usage. CPU Spikes Protection is designed to reduce the impact on user experience by processes that consume an excessive percentage of CPU Usage.

If a process exceeds the CPU Usage Limit (%) value, for over a set period of time (defined by the Limit Sample Time (s) value), the process will be relegated to Low Priority for a set period of time, defined by the Idle Priority Time (s) value. The CPU usage Limit (%) value is global across all logical processors.

The total number of logical processors is determined by the number of CPUs, the number of cores in the CPU, and whether HyperThreading is enabled. The easiest method of determining the total number of logical cores in a machine is by using Windows Task Manager (2 logical processors shown in the image):

User-added image

To better understand CPU Spikes Protection, let’s follow a practical scenario:

Users commonly work with a web app that uses Internet Explorer. An administrator has noticed that iexplore.exe processes on the VDAs consume a lot of CPU time and overall responsiveness in user sessions is suffering. There are many other user processes running and percentage CPU usage is running in the 90 percent range.

To improve responsiveness, the administrator sets the CPU Usage Limit value to 50% and a Idle Priority Time of 180 seconds. For any given user session, when a single iexplore.exe process instance reaches 50% CPU usage, it’s CPU priority is immediately lowered to Low for 180 seconds. During this time iexplore.exe will consequently get less CPU time due to its low position in the CPU queue and thereby reduce its impact on overall session responsiveness. Other user processes that haven’t also reached 50% have a higher CPU priority and so continue to consume CPU time and although the overall percentage CPU usage continues to show above 90%, the session responsiveness for that user is greatly improved.

In this scenario, the machine has 4 logical processors. If the processes’ CPU usage is spread equally across all logical processors, each will show 12.5% usage for that process instance.

If there are two iexplore.exe process instances in a session, their respective percentage CPU usage values are not added to trigger Spikes Protection. Spikes Protection settings apply on each individual process instance.​

User-centric CPU Optimization (process tracking on the WEM Agent):

As stated previously, all WEM System Optimization settings are machine-based and settings configured for a particular ConfigSet will apply to all users launching sessions from the VDA.

The WEM Agent records the history of every process on the machine that has triggered Spikes Protection. It records the number of times that the process has triggered Spikes Protection, and it records the user for which the trigger occurred.

So if a process triggers the CPU Spikes Protection in User A’s session, the event is recorded for User A only. If User B starts the same process, then WEM Process Optimization behavior is determined only by process triggers in User B’s session. On each VDA the Spike Protection triggers for each user (by user SID) are stored in the local database on the VDA and refreshing the cache does not interfere with this stored history.

Limit CPU / Core Usage:

When a process has exceeded the CPU Usage Limit value (i.e. Spikes Protection for the process has been triggered), in addition to setting the CPU priority to Low, WEM can also limit the amount of CPU cores that the process uses if a CPU / Core Usage Limit value is set. The limit is in effect for the duration of the Idle Priority Time.

Enable Intelligent CPU Optimization:

When Enable Intelligent CPU Optimization is enabled, all processes that the user launches in their session will start at a CPU Priority of High. This makes sense as the user has purposefully launched the process, so we want the process to be reactive.

If a process triggers Spikes Protection, it will be relegated to Low priority for 180 seconds (if default setting is used). But, if it triggers Spikes Protection a certain number of times, the process will run at the next lowest CPU Priority the next time it’s launched.

So it was launching at High priority initially; once the process exceed a certain number of triggers, it will launch at Above Normal priority the next time. If the process continues to trigger Spikes Protection, it will launch at the next lowest priority until eventually it will launch at the lowest CPU priority.

The behavior of Enable Intelligent CPU Optimization is overridden if a static CPU Priority value has been set for a process. If Enable Intelligent CPU Optimization is enabled and a process’s CPU Priority value has been set to Below Normal, then the process will launch at Below Normal CPU priority instead of the default High priority.

If Enable Intelligent CPU Optimization is enabled and a process’s CPU Priority value has been statically set to High, then the process will launch at High. If the process triggers Spikes Protection, it will be relegated to Low priority for 180 seconds (if default setting is used), but then return to High priority afterwards.

Note: The Enable CPU Spikes Protection box must be ticked for Enable Intelligent CPU Optimization to work.


Memory Management

Working Set Optimization:

WEM determines how much RAM a running process is currently using and also determines the least amount of RAM the process requires, without losing stability. The difference between the two values is considered by WEM to be excess RAM. The process’s RAM usage is calculated over time, the duration of which is configured using the Idle Sample Time (min) WEM setting. The default value is 120 minutes.

Let’s look at a typical scenario when WEM Memory Management has been enabled:

A user opens Internet Explorer, navigates to YouTube, and plays some videos. Internet Explorer will use as much RAM as it needs. In the background, and over the sampling period, WEM determines the amount of RAM Internet Explorer has used and also determines the least amount of RAM required, without losing stability.

Then the user is finished with Internet Explorer and minimizes it to the Task Bar. When the process percentage CPU usage drops to the value set by the Idle State Limit (percentage) value (default is 1%), WEM then forces the process to release the excess RAM (as previously calculated). The RAM is released by writing it to the pagefile.

When the user restores Internet Explorer from the Task Bar, it will initially run in its optimized state but can still go on to consume additional RAM as needed.

When considering how this affects multiple processes over multiple user sessions, the result is that all of that RAM freed up is available for other processes and will increase user density by supporting a greater amount of users on the same server.

Idle State Limit (percent):

The value set here is the percentage of CPU usage under which a process is considered to be idle. The default is 1% CPU usage. Remember that when a process is considered to be idle, WEM forces it to shed its excess RAM. So be careful not to set this value too high; otherwise a process being actively used may be mistaken as an idle process, resulting in its memory being released. It is not advised to set this value higher than 5%.


I/O Management

These settings allow you to optimize the I/O priority of specific processes, so that processes which are contending for disk and network I/O access do not cause performance bottlenecks. For example, you can use I/O Management settings to throttle back a disk-bandwidth-hungry application.

The process priority you set here establishes the “base priority” for all of the threads in the process. The actual, or “current,” priority of a thread may be higher (but is never lower than the base). In general, Windows give access to threads of higher priority before threads of lower priority.

I/O Priority Settings:

Enable Process I/O Priority

When selected, this option enables manual setting of process I/O priority. Process I/O priorities you set take effect when the agent receives the new settings and the process is next restarted.

Add Process I/O Priority

Process Name: The process executable name without the extension. For example, for Windows Explorer (explorer.exe) type “explorer”.

I/O Priority: The “base” priority of all threads in the process. The higher the I/O priority of a process, the sooner its threads get I/O access. Choose from High, Normal, Low, Very Low.

Enable Intelligent I/O Optimization:

This adopts exactly the same principles as Enable Intelligent CPU Optimization, but for I/O instead of CPU.

Note: The Enable CPU Spikes Protection box must be ticked for Enable Intelligent I/O Optimization to work.


Exclude specified processes:

By default, WEM CPU Management excludes all of the most common Citrix and Windows core service processes. This is because they make the environment run and they need to make their own decisions about how much CPU time & priority they need. WEM administrators can however, add processes they want to exclude from Spikes Protection to the list. Typically, antivirus processes would be excluded. In this case, in order to stop antivirus scanning taking over disk I/O in the session, administrators would also set a static I/O Priority of Low for antivirus processes.


Notes:

  1. When configuring, the entered process name is a match to the process name’s entry in Windows Task Manager.
  2. Process names are not case-sensitive.
  3. You don’t enter “.exe” after the process name. So for instance, enter “notepad” rather than “notepad.exe”.

Related:

  • No Related Posts