SD-WAN QoS – FAQ

2. What type of traffic is allocated by default to different Classes?

In the SD-WAN environment, we think of applications as falling into one of the following three classes:

Real-time –VoIP or VoIP like applications, such as Skype or ICA audio. In general, we refer to voice only applications that use small UDP packets that are business critical

Interactive – This is the broadest category, and refers to any application that has a high degree of user interaction. Some of these applications, for example video conferencing, is sensitive to latency, and requires high bandwidth. Other applications like HTTPS, may need less bandwidth, but are critical to the business. Interactive applications are typically transactional is nature.

Bulk – This is any application that does not need rich user experience but is more about moving data (i.e. FTP or backup/replication)

3. How real-time class works vs interactive:

Real-time (RT) classes are given the highest priority and gets up to 50% of the overall scheduler time. Each class can be weighted with respect to the other RT classes, for example, we could have two RT classes one that weighted to 70% and the other to 30%.

Interactive (INT) classes take the next priority and can consume the rest of the scheduler time as the traffic demands. Individual INT classes can be weighted and by default we have 4 weights (high, medium, low and very low) defined.

4. Will bulk suffer if interactive and real-time flows are there?

Yes, Bulk traffic is serviced after real-time and interactive traffic are serviced. Typically, a bulk class gets a lower sustained share % than an interactive class.

5. How QoS classes are prioritized?

Real-time (RT) classes are given the highest priority and gets up to 50% of the overall scheduler time. Each class can be weighted with respect to the other RT classes, for example, we could have two RT classes one that weighted to 70% and the other to 30%.

Interactive (INT) classes take the next priority and can consume the rest of the scheduler time as the traffic demands. Individual INT classes can be weighted and by default we have 4 weights (high, medium, low and very low) defined.

Bulk (BLK) classes takes the lowest priority and can be considered scavenge classes. They can be weighted but they can be completely starved of bandwidth if the INT/RT traffic is consuming all of the scheduler time.

6. What is the purpose of “Retransmit Lost Packets” option under WAN General, IP Rules?

If the receiving SD-WAN appliance detects a missing packet it can request that packet to be resent by the sending SD-WAN appliance.

7. What is the Criteria for the QoS calculation?

QoS is always calculated on the Send Side.

The Fair Share calculation for the services is based on Per Wan Link.

8. What is Duel Ended QoS?

The Receive side sends the Control Packets to advertise the available bandwidth before the actual Data transfer is initiated.

9. How is share provided during contention?

Please refer this article: https://support.citrix.com/article/CTX256716

10. Difference between the Drop Limit and Drop Depth:

Drop Limit: If the Estimated exceeds the threshold, the packet will be discarded. Not valid for Bulk Classes

Drop Depth (Send Buffer): The Max amount of estimated time that packets smaller than the large packet size will have to wait in the class scheduler. If the queue depth exceeds the threshold, the packet will be discarded and the statistics will be counted.



11. How Drop Limit is calculated (MS)?

Number of bytes Queued divided by Bandwidth available for the class.

12. What are transmit modes based on?

•Persistent path – Based on the latency. If there’s a latency >50mS then there will be a penalty on that path and a new path will be chosen.

•Load Balanced Path– Based on the packet Loss.

•Duplicate paths: Packets will be duplicated over the WAN links.

13. What is MOS (Mean opinion Score) under rule groups?

This Feature gathers application statistics from WAN to LAN side of the Virtual path. It Measure of the quality of the experience that an application delivers to end users. It is primarily used for VoIP applications. In SD-WAN, MOS is also used to assess the quality of non-VoIP applications.

14. What is Application QoS and how to implement it?

By default on the SD-WAN, we have pre-defined Application Family based on the type of the application in the incoming Traffic. For Example: Anti-Virus, Microsoft Office, etc…

It is also possible to create Custom application object.

15. QoS Fairness (RED):

Please refer to this Document:

https://docs.citrix.com/en-us/netscaler-sd-wan/10/quality-of-service/qos-fairness.htm

16. Do we have an option to enable Auto Bandwidth provisioning?

Yes, from SD-WAN Version 10.2.x we have an option under Site —> Wan Links —> Provisioning to enable Auto-Bandwidth Provisioning.

17. What is Auto-Bandwidth Provisioning?

When enabled, the shares for all services defined in the Provisioning section will be auto calculated and applied according to the size of Bandwidth that may be required for the remote sites.

18. How to diagnose if an issue is with SD-WAN or not with respect to QoS?

Based on Multiple factors:

Related:

  • No Related Posts

How To Configure HDX RTOP Realtime Media Engine DSCP QoS

  • The settings are only applied on Linux and OSX endpoints. For Windows endpoints, these settings will be ignored. QoS for Windows endpoints needs to be configured using the Group Policy-based QoS mechanism with policies applied at the endpoint.

    ______________________________________________________________________________________


    Remember, for non-windows devices you need to multiple your desired DSCP value by 4 and set this new value via the registry.

    Here is an example:

    [HKEY_CURRENT_USERSoftwareCitrixHDXRTConnectorMediaEngineNetworking]

    “AudioTOS”=dword:000000b8

    “VideoTOS”=dword:000000a4

    “RtcpTOS”=dword:00000080

    Hex(b8) = Dec(184) -> DSCP = 184 / 4 = 46 is the actual DSCP value you want to see in the network traffic

    Hex(a4) = Dec(164) -> DSCP = 164 / 4 = 41 is the actual DSCP value you want to see in the network traffic

    Hex(80) = Dec(128) -> DSCP = 128 / 4 = 32 is the actual DSCP value you want to see in the network traffic


    The reason for the Multiply by 4 appears to be to requirement to meet the full ToS specification – which has two bits at the end for ”ECN” (see the Wikipedia article for more details) – 00 just means these are not configured, and this then bit-shifts the same DSCP value over first 6 bits, like this:

    the ecn, last two bits, is always 00.


    By Multiplying by 4 you basically do a bit shift:

    46 in binary is 101110

    But you have to set 8 bits (in order to comply with the RFC), so basically add 00 at the end (or multiple by 4): 184 in binary is 10111000

  • Related:

    The business case for deploying network monitoring systems

    It takes a significant amount of time and money to properly design and implement an enterprise-class network performance monitoring (NPM) foundation. And because of this, you need to understand not only what features are available within an NPM package, but whether those features will actually be beneficial to your infrastructure team.

    You may find that only the basic features are needed today, but you may want the ability to add more complex tools in the future. Others might find it’s best to deploy a fully functional NPM on day one. Most, however, will likely fall somewhere in between. In this article, we examine why these performance-oriented network monitoring systems bear watching and which features are most important.

    Downtime is becoming increasingly unacceptable

    In a perfect world, QoS will be properly configured from one end of the network to the other. But oftentimes, QoS is either not configured or poorly configured somewhere along the data path.
    One obvious trend driving NPM is the need to quickly resolve downtime issues that arise. While the ideal solution would be to create a fully redundant network from end to end, in many cases this isn’t possible. This can be due to limitations in the architecture itself, an inability to provide physical redundancy, or budgets that preclude a fully redundant approach. When automated failover isn’t possible, the next best thing is to develop and deploy an advanced network monitoring system platform to identify and alert staff when an outage is occurring — or about to occur. The faster a problem can be identified, the faster it can be fixed.

    In some cases, this simply means implementing tools to monitor network devices and individual links. Alerting based on collected log messages is another common tool. In other cases, monitoring all the way to the application layer is required. The vast majority of network monitoring systems today offer the ability to monitor network-only functions or monitor and alert on both network and application issues that arise. Additionally, deep packet inspection appliances can rapidly find performance issues at critical points on the network.

    Applications are becoming more time-sensitive

    Thanks to a dramatic increase in real-time collaboration applications like voice and video — as well as the growth of distributed application architectures — data traversing networks is more time-sensitive than ever. As a result, data streams for low-latency applications must be identified, marked and treated with a higher priority than other data running across the same network connections. The primary tool to perform these types of tasks is quality of service (QoS). Layer 2 and 3 devices, such as routers and switches, are configured with QoS policies and queuing actions based on those policies.

    In a perfect world, QoS will be properly configured from one end of the network to the other. But oftentimes, QoS is either not configured or poorly configured somewhere along the data path. This one mistake can cause major problems for time-sensitive communications. Identifying these problems manually often requires logging in and verifying each QoS configuration along the path. Many network monitoring systems, on the other hand, have QoS analysis capabilities, using NetFlow or sFlow, to automatically identify ineffective or incorrectly configured QoS policies.

    Network architecture is growing in complexity

    Data center virtualization and network overlays often mask underlying network problems. Suddenly, administrators have to troubleshoot both the underlying physical foundation as well as accompanying virtualized networks in order to find and resolve performance problems. Many IT departments only have tools to monitor one or the other. And if they have the ability to monitor both, they may be completely independent tools.

    Many modern NPMs can monitor both physical and virtualized architectures and determine on which network plane the problem resides. This gives support administrators complete visibility into the network, an increasingly important requirement as more virtualization and overlay techniques are added.

    Event correlation and root cause analysis is ineffective

    Finding and resolving network and application problems is one thing. Finding the root cause of the problem is another. On very large and complex networks, it’s very possible to implement fixes or workarounds that resolve the immediate issue, yet never address the underlying cause. Many times, this leads to drastic and inefficient network changes to fix a problem — when the root cause was actually due to upper-layer problems that went unchecked.

    Many network monitoring systems offer added intelligence to collect and analyze various network and application events. By doing so, reports can be created that correlate — or at least isolate — the origin of the initial problem began. When properly configured and tuned, this significantly reduces root cause investigations by helping the administrator focus on the problem and verify the correlated information. And since modern NPMs collect data up to the application level, many root causes that previously went unnoticed can now be identified and properly remediated.

    Seeking single-pane-of-glass monitoring and troubleshooting

    The potential of integrating so many useful network and performance monitoring tools into a single, unified system is highly appealing. Gone are the days of independent SNMP monitors, logging servers, NetFlow collectors and packet sniffers. We now have the ability to unify all of these useful features into a single NPM product. What’s more, by creating a single pane of glass, we also create a single data repository for which reports and intelligent decisions can be made with powerful data correlation methods.

    Source: http://searchnetworking.techtarget.com/feature/The-business-case-for-deploying-network-monitoring-systems

    Related:

    Event ID 9009 — TCP/IP Network Performance

    Event ID 9009 — TCP/IP Network Performance

    Updated: April 17, 2008

    Applies To: Windows Server 2008

    Network performance encompasses all aspects of data transfer performance, such as download and upload speeds, number of packets dropped versus packets delivered, and the round-trip time of connections.

    These aspects of network performance might be affected by congestion in the network. In the case of wireless networks, signal attenuation, electromagnetic interference, and the mobility of the host also affect network performance.

    Event Details

    Product: Windows Operating System
    ID: 9009
    Source: tcpip
    Version: 6.0
    Symbolic Name: EVENT_TRANSPORT_TRANSFER_DATA
    Message: %2 could not transfer a packet from the network adapter. The packet was dropped.

    Resolve
    Reduce the load on the remote computer

    If the packets are dropped because of network congestion and poor network performance, reduce the load on, or increase the capacity of, the computer.

     

    Verify

    To perform this procedure, you must have membership in Administrators, or you must have been delegated the appropriate authority.

    To measure network performance, run Performance Monitor:

    1. Click Start, click All Programs, click Accessories, right-click Command Prompt, and then click Run as administrator.
    2. Click Continue when prompted by User Account Control, and then provide the administrator password, if requested.
    3. In the Performance Monitor console tree, click Reliability and Performance.
    4. Network, CPU, and memory utilization data are available in the details pane.

    If you have recorded Performance Monitor counters in the past, compare the current load to your average loads over time. If you do not have any baseline readings from past performance monitoring, continue to monitor network, CPU, and memory utilization by looking for large fluctuations in performance that might indicate a heavy traffic load or an attack.

    Related Management Information

    TCP/IP Network Performance

    Networking

    Related:

    VMAX & OpenStack Ocata: An Inside Look Pt. 2: Over-Subscription/QoS/Compression

    Welcome back to VMAX & OpenStack Ocata: An Inside Look! Although we are on to part 2 of our multi-part series, this piece can be seen as more of an extension of what we covered in part 1, where we went through the basic setup of your VMAX & OpenStack environment. This time we are going to take your environment setup that bit further and talk about the areas of over-subscription, quality of service (QoS), and compression.

    Again, and as always, if you have any feedback, comments, spot any inconsistencies, want something covered or just a question answer, please feel free to contact me directly or leave a comment in the comments section below!

    1. Over-Subscription

    OpenStack Cinder enables you to choose a volume back-end based on virtual capacities for thin provisioning using the over-subscription ratio. To support over-subscription in thin provisioning, a flag max_over_subscription_ratio is introduced into cinder.conf and the existing flag reserved_percentage must be set. These flags are both optional and do not need to be included if over-subscription is not required for the backend.

    The max_over_subscription_ratio flag is a float representation of the over-subscription ratio when thin provisioning is involved. The table below will illustrate the float representation to over-subscribed provisioned capacity relationship:

    Float Representation Over-subscription multiple (of total physical capacity)
    20.0 (Default) 20x
    10.5 10.5x
    1.0 No over-subscription
    0.9 or lower Ignored

    Note: max_over_subscription_ratio can be configured for each back end when multiple-storage back ends are enabled. For a driver that supports multiple pools per back end, it can report this ratio for each pool.



    The existing reserved_percentage flag is used to prevent over provisioning. This flag represents the percentage of the back-end capacity that is reserved. It is the high water mark where by the physical remaining space cannot be exceeded. For example, if there is only 4% of physical space left and the reserve percentage is 5, the free space will equate to zero. This is a safety mechanism to prevent a scenario where a provisioning request fails due to insufficient raw space.

    Note: There is a change on how reserved_percentage is used. It was measured against the free capacity in the past. Now it is measured against the total capacity.



    Example VMAX Configuration Group

    The code snippet below demonstrates the settings configured in a VMAX backend configuration group within cinder.conf:

    [CONF_GROUP_ISCSI]cinder_emc_config_file = /etc/cinder/cinder_emc_config_VMAX_ISCSI_SILVER.xmlvolume_driver = cinder.volume.drivers.dell_emc.vmax.iscsi.VMAXISCSIDrivervolume_backend_name = VMAX_ISCSI_SILVERmax_over_subscription_ratio = 2.0reserved_percentage = 10

    Over-Subscription with EMCMaxSubscriptionPercent

    For the second example of over-subscription, we are going to take into account the EMCMaxSubscriptionPercent property on the pool. This value is the highest that a pool can be over-subscribed. Setting the EMCMaxSubscriptionPercent property is done via SYMCLI:

    # symconfigure -sid 0123 -cmd “set pool MyThinPool, type=thin, max_subs_percent=150;” commit

    Viewing the pool details can be performed via the command:

    # symcfg -sid 0123 list -thin -detail -pool -gb

    When setting EMCMaxSubscriptionPercent via SYMCLI, it is important to remember that the max_over_subscription_ratio defined in cinder.conf cannot exceed what is set at pool level in the EMCMaxSubscriptionPercent property. For example, if EMCMaxSubscriptionPercent is set to 500 and the user defined max_over_subscription_ratio is set to 6, the latter is ignored and over-subscription is set to 500%.



    EMCMaxSubscriptionPercent max_over_subscription_ratio Over-Subscription %

    200 2.5 200
    200 1.5 150
    0 (no upper limit on pool) 1.5 150
    0 (no upper limit on pool) 0 150 (default)
    200 (pool1) 300 (pool2) 2.5 200 (pool1) 250 (pool2)



    Note: If FAST is set and multiple pools are associated with a FAST policy, then the same rules apply. The difference is, the TotalManagedSpace and EMCSubscribedCapacity for each pool associated with the FAST policy are aggregated.



    2. Quality of Service (QoS)

    Quality of Service (QoS) is the measurement of the overall performance of a service, particularly the performance see by the users of a given network. To quantitatively measure QoS, several related aspects of the network service are often considered, but for QoS for VMAX & OpenStack environments we are going to focus on three:

    • I/O limit per second (IOPs) – This is the amount of read/write operations per second, in the context of QoS setting this value specifies the maximum IOPs, valid values range from 100 IOPs to 100,000 IOPs (in 100 increments)
    • Throughput per second (MB/s) – This is the amount of bandwidth in MB per second,similar to IOPs setting this will designate the value as the maximum allowed MB/s, valid values range from 1 MB/s to 100,000 MB/s.
    • Dynamic Distribution – Dynamic distribution refers to the automatic load balancing of IO across configured ports. There are two types of Dynamic Distribution; Always & Failure:
      • Always – Enables full dynamic distribution mode. When enabled, the configured host I/O limits will be dynamically distributed across the configured ports, thereby allowing the limits on each individual port to adjust to fluctuating demands
      • OnFailure – Enables port failure capability. When enabled, the fraction of configured host I/O limits available to a configured port will adjust based on the number of ports currently online.

    For more information on setting host IO limits for VMAX please refer to the ‘Unisphere for VMAX Online Guide‘ section called ‘Setting Host I/O Limits’.

    Configuring QoS in OpenStack for VMAX

    In OpenStack, we create QoS settings for volume types so that all volumes created with a given volume type has the respective QoS settings applied. There are two steps involved in creating the QoS settings in OpenStack:

    • Creating the QoS settings
    • Associating the QoS settings with a volume type

    When specifying the QoS settings, they are added in key/value pairs. The (case-sensitive) keys for each of the settings are:

    • maxIOPS
    • maxMBPS
    • DistributionType



    As with anything in Openstack, there are two ways to do anything, there is no difference with QoS. You have the choice of either configuring QoS via the CLI or using the the Horizon web dashboard. Obviously using the CLI is the much quicker of the two, but if you do not understand CLI commands, or even QoS, I would recommend sticking the web dashboard method.You can find the CLI example below, but if you would like the UI step-by-step guide with screenshots, you can read the DECN hosted document created for this article ‘QoS for VMAX on OpenStack – A step-by-step guide‘.

    Setting QoS Spec

    1. Create QoS specs. It is important to note that here the QoS key/value pairs are optional, you need only include them if you want to set a value for that specific key/value pair. {QoS_Spec_Name} is the name which you want to assign to this QoS spec:

    Command Structure:

    # cinder qos-create {QoS_spec_name} maxIOPS={value} maxMBPS={value} DistributionType={Always/OnFailure}

    Command Example:

    # cinder qos-createFC_NONE_QOS maxIOPS=4000maxMBPS=4000DistributionType=Always

    2. Associate the QoS spec from step 1 with a pre-existing VMAX volume type:

    Command Structure:

    # cinder qos-associate {QoS_spec_id} {volume_type_id}

    Command Example:

    # cinder qos-associate 0b473981-8586-46d5-9028-bf64832ef8a3 7366274f-c3d3-4020-8c1d-c0c533ac8578



    QoS Use-Case Scenarios

    When using QoS to set specs for your volumes, it is important to know how the specs behave when set at Openstack level, Unisphere level, or both. The following use-cases aims to clarify the expected behaviour, leaving you in complete control over your environment when done!

    Use-Case 1 – Default Values

    Settings:

    SG QoS Specs in Unisphere

    (before change)

    QoS Specs set in Openstack

    Host I/O Limit (MB/Sec) = No Limit

    Host I/O Limit (IO/Sec) = No Limit

    Set Dynamic Distribution = N/A

    maxIOPS = 4000

    maxMBPS = 4000

    DistributionType = Always

    Outcome:

    SG QoS Specs in Unisphere

    (after change)

    Outcome – Block Storage (Cinder)

    Host I/O Limit (MB/Sec) = 4000

    Host I/O Limit (IO/Sec) = 4000

    Set Dynamic Distribution = Always

    Volume is created against volume type and

    QoS is enforced with the parameters specified

    in the OpenStack QoS Spec.

    Use-Case 2 – Preset Limits

    Settings:

    SG QoS Specs in Unisphere

    (before change)

    QoS Specs set in Openstack

    Host I/O Limit (MB/Sec) = 2000

    Host I/O Limit (IO/Sec) = 2000

    Set Dynamic Distribution = Never

    maxIOPS = 4000

    maxMBPS = 4000

    DistributionType = Always

    Outcome:

    SG QoS Specs in Unisphere

    (after change)

    Outcome – Block Storage (Cinder)

    Host I/O Limit (MB/Sec) = 4000

    Host I/O Limit (IO/Sec) = 4000

    Set Dynamic Distribution = Always

    Volume is created against volume type and

    QoS is enforced with the parameters specified

    in the OpenStack QoS Spec.

    Use-Case 3 – Preset Limits

    Settings:

    SG QoS Specs in Unisphere

    (before change)

    QoS Specs set in Openstack

    Host I/O Limit (MB/Sec) = No limit

    Host I/O Limit (IO/Sec) = No limit

    Set Dynamic Distribution = N/A

    DistributionType = Always

    Outcome:

    SG QoS Specs in Unisphere

    (after change)

    Outcome – Block Storage (Cinder)

    Host I/O Limit (MB/Sec) = No limit

    Host I/O Limit (IO/Sec) = No limit

    Set Dynamic Distribution = N/A

    Volume is created against volume type and

    there is no volume change

    3. Compression

    If you are using a VMAX All-Flash (250F, 450F, 850F, 950F) in your environment, you can avail of inline compression in your OpenStack environment. By default compression is enabled, so if you want it right now you don’t even have to do a thing!

    VMAX All Flash delivers a net 4:1 overall storage efficiency benefit for typical transactional workloads when inline compression is combined with snapshots and other HYPERMAX OS space saving capabilities. VMAX inline compression minimizes footprint while intelligently optimizing system resources to ensure the system is always delivering the right balance of performance and efficiency. VMAX All Flash inline compression is:

    • Granular: VMAX All Flash compression operates at the storage group (application) level so customers can target those workloads that provide the most benefit.
    • Performance optimized: VMAX All Flash is smart enough to make sure very active data is not compressed until it becomes less active. This allows the system to deliver maximum throughput leveraging cache and SSD technology, and ensures that system resources are always available when required.
    • Flexible: VMAX All Flash inline compression works with all data services such as including SnapVX & SRDF

    Compression, VMAX & OpenStack

    As mentioned previously, on an All Flash array the creation of any storage group has a compressed attribute by default and compression is enabled by default also. Setting compression on a volume type does not mean that all the devices associated with that type will be immediately compressed. It means that for all incoming writes compression will be considered. Setting compression off on a volume type does not mean that all the devices will be uncompressed. It means all the writes to compressed tracks will make these tracks uncompressed.

    Controlling compression for VMAX volume types is handled through the extra specs of the volume type itself. Up until now, the only extra spec we set for a volume type is the volume_backend_name, compression requires an additional extra spec to be applied to the volume type called storagetype:disablecompression=[True/False].

    Note: If extra spec storagetype:disablecompression is set on a VMAX-3 Hybrid array, it is ignored because compression is not a feature on a VMAX3 hybrid.

    Using Compression for VMAX

    Compression is enabled by default on all All-Flash arrays so you do not have to do anything to enable it for storage groups created by OpenStack. However, there are occasions whereby you may want to disable compression or retype (don’t worry, retype will be discussed in detail later in this article!) a volume from an uncompressed to a compressed volume type. Before each of the use-cases outlined below, please complete the following steps for each use-case:

    1. Create a new volume type called VMAX_COMPRESSION_DISABLED
    2. Set an extra spec volume_backend_name
    3. Set a new extra spec storagetype:disablecompression=True
    4. Create a new volume with the VMAX_COMPRESSION_DISABLED volume type

    Use-Case 1: Compression disabled – create, attach, detach, and delete volume

    1. Check in Unisphere or SYMCLI to see if the volume exists in storage group OS-<srp>-<servicelevel>-<workload>-CD-SG, and compression is disabled on that storage group
    2. Attach the volume to an instance. Check in Unisphere or symcli to see if the volume exists in storage group OS-<shorthostname>-<srp>-<servicelevel>-<workload>-CD-SG, and compression is disabled on that storage group
    3. Detach volume from instance. Check in Unisphere or symcli to see if the volume exists in storage group OS-<srp>-<servicelevel>-<workload>-CD-SG, and compression is disabled on that storage group.
    4. Delete the volume. If this was the last volume in the OS-<srp>-<servicelevel>-<workload>-CD-SG storage group, it should also be deleted.

    Use-Case 2: Compression disabled – create, delete snapshot and delete volume

    1. Check in Unisphere or SYMCLI to see if the volume exists in storage group OS-<srp>-<servicelevel>-<workload>-CD-SG, and compression is disabled on that storage group
    2. Create a snapshot. The volume should now exist in OS-<srp>-<servicelevel>-<workload>-CD-SG
    3. Delete the snapshot. The volume should be removed from OS-<srp>-<servicelevel>-<workload>-CD-SG
    4. Delete the volume. If this volume is the last volume in OS-<srp>-<servicelevel>-<workload>-CD-SG, it should also be deleted.

    Use-Case 3: Retype from compression disabled to compression enabled

    1. Create a new volume type. For example VMAX_COMPRESSION_ENABLED
    2. Set extra spec volume_backend_name as before
    3. Set the new extra spec’s compression as storagetype:disablecompression = False or DO NOT set this extra spec
    4. Retype from volume type VMAX_COMPRESSION_DISABLED to VMAX_COMPRESSION_ENABLED
    5. Check in Unisphere or SYMCLI to see if the volume exists in storage group OS-<srp>-<servicelevel>-<workload>-SG, and compression is enabled on that storage group

    Whats coming up in part 3 of ‘VMAX & OpenStack Ocata: An Inside Look’…

    With the setup out of the way and extra functionality taken into consideration, we can now begin to get into the fun stuff, block storage functionality! Next time we will be starting at the start in terms of functionality, going through all of the basic operations that the VMAX driver supports in OpenStack.

    Related:

    QoS for VMAX on OpenStack – A step-by-step guide

    Note: This document is supplementary to the article ‘VMAX & OpenStack Ocata: An Inside Look Pt. 2: Over-Subscription/QoS/Compression/Retype

    Quality of Service (QoS) is the measurement of the overall performance of a service, particularly the performance see by the users of a given network. To quantitatively measure QoS, several related aspects of the network service are often considered, but for QoS for VMAX & OpenStack environments we are going to focus on three:

    • I/O limit per second (IOPs)
    • Throughput per second (MB/s)
    • Dynamic Distribution
      • Always
      • OnFailure

    For more information on setting host IO limits for VMAX please refer to the ‘Unisphere for VMAX Online Guide‘ section called ‘Setting Host I/O Limits’.

    Configuring QoS in OpenStack for VMAX

    In OpenStack, we create QoS settings for volume types so that all volumes created with a given volume type has the respective QoS settings applied. There are two steps involved in creating the QoS settings in OpenStack:

    • Creating the QoS settings
    • Associating the QoS settings with a volume type

    When specifying the QoS settings, they are added in key/value pairs. The (case-sensitive) keys for each of the settings are:

    • maxIOPS
    • maxMBPS
    • DistributionType



    As with anything in Openstack, there are two ways to do anything, there is no difference with QoS. You have the choice of either configuring QoS via the CLI or using the the Horizon web dashboard. Obviously using the CLI is the much quicker of the two, but if you do not understand CLI commands, or even QoS, I would recommend sticking the web dashboard. For clarity on both approaches, I will go through each step-by-step, the process will never change only the values used in the key/value pairings, so one example can applied to all scenarios.

    Setting QoS via the OpenStack CLI

    1. Create QoS specs. It is important to note that here the QoS key/value pairs are optional, you need only include them if you want to set a value for that specific key/value pair. {QoS_Spec_Name} is the name which you want to assign to this QoS spec:



    Command Structure:

    # cinder qos-create {QoS_spec_name} maxIOPS={value} maxMBPS={value} DistributionType={Always/OnFailure}

    Command Example:

    # cinder qos-createFC_NONE_QOS maxIOPS=4000maxMBPS=4000DistributionType=Always

    QoS - CLI Confirm.PNG.png

    2. Associate the QoS spec from step 1 with a pre-existing VMAX volume type:



    Command Structure:

    # cinder qos-associate {QoS_spec_id} {volume_type_id}

    Command Example:

    # cinder qos-associate 0b473981-8586-46d5-9028-bf64832ef8a3 7366274f-c3d3-4020-8c1d-c0c533ac8578



    3. Confirm the association was successful by checking the volume type associated in the previous command

    Command Example:

    # cinder type-show VMAX_FC_NONE

    QoS - CLI Confirm 2.PNG.png



    Setting the QoS via OpenStack Horizon Web UI

    1. Navigate to Admin>Volume>Volume Types and click on ‘Create QoS Spec’

    QoS - 1.PNG.png

    2. Give your QoS spec a name and set the Consumer property to ‘front-end’ (this will apply the QoS spec to the Compute back-end, being the volume and not the storage group the volume is in)

    QoS - 2.PNG.png

    3. Once created, you will be able to specify your QoS spec details. To do so, from the ‘Volume Types’ screen, click on ‘Manage Specs’ next to your newly created QoS spec. In the next dialoge box to open, click on ‘Create’ to add a new key/value pair to the spec.

    QoS - 5.PNG.png

    4. Continue this process of creating new key/value pairs until you have added all of the required settings to your spec. Click ‘Close’ when you are done adding key/value pairings to the spec. You will get visual confirmation from the UI that your settings have been applied

    QoS - 6.PNG.png

    5. With all of your QoS specs defined, the last step is to associate it with a volume type. To do so, from the same ‘Volume Types’ screen, clock on the drop down box beside your volume type and select ‘Manage QoS Spec Association’. Select your QoS Spec in the next dialogue box and click ‘Associate’

    QoS - 7.PNG.png

    QoS - 8.PNG.png

    6. You will get visual confirmation of the successful association of the QoS spec with your VMAX volume type from the UI

    QoS - 9.PNG.png

    Confirming the QoS Spec in Unisphere

    With the QoS specification associated with your VMAX volume type in OpenStack, the specs will be added to the storage group associated with the volume type on your VMAX. You can confirm the successful association of QoS specs in Unisphere by looking at the details of the associated storage group.

    1. From the Unisphere dashboard, navigate to Array>Storage>Storage Group Dashboard and select the storage group associated with your volume type from the ‘Search’ menu

    2. From the details page for your storage group, there is a section near the bottom which details host IO limits, MB/s limits, and dynamic distribution. The QoS specs defined in OpenStack should be mirrored in Unisphere if the association operation was successful.

    QoS - U4V Confirm.PNG.png

    QoS Use-Case Scenarios

    When using QoS to set specs for your volumes, it is important to know how the specs behave when set at Openstack level, Unisphere level, or both. I have detailed these scenarios in the article mentioned at the start of this document ‘VMAX & OpenStack Ocata: An Inside Look Pt. 2: Over-Subscription/QoS/Compression/Retype

    Related:

    What is Real Time Network Analytics and Why Customers care about it

    In today’s world of hyper speed business decisions and need to be agile enough to stay ahead of the competition, match or exceed new market demands, and manage demanding customer expectations, your customer facing application behavior will succeed or fail based primarily on its performance.

    This puts a tremendous responsibility on Network Engineers (aka Network Operators) in Enterprises and HyperScale Data Centers as most of the unexplained application performance shortfalls result from an underlying network infrastructure not providing proper performance and scale on demand, or it was not adequately designed for optimum application performance. The Network Operators will have to ensure that their networks are responsive, “always on” and capable of meeting the ever-growing demand from applications they run. Providing operators with deeper instrumentation and telemetry data about the network help operators diagnose network issues, plan and fine-tune the network to provide improved performance and make optimal use of network resources.

    One of the main causes of these unexplained application performance issues stems from latency caused by underlying network congestion. Among the causes of a network, congestion is an elusive type of network congestion called “Microburst.” As the name suggest “microbursts” are sub-second periods of time when major bursts of network usage occur at line rate and can temporarily overflow the switch buffers and cause packet loss or backpressure.

    Traditionally “congestion” has been associated with switch ports being utilized at close to line rate. In a congestion scenario, packets can be dropped by the switch or flows may backpressure due to lack of buffer space. However, a more recent analysis has uncovered the existence of these “microbursts” occurring more frequently than we may have guessed and there was no good way to detect them, resulting in network engineers looking for the proverbial “needle in a haystack” to find the causes of unexpected application performance issues in their network.

    Typically, these “microbursts” do not last long enough to be detected by traditional switch counters such as SNMP or port statistics. This is because traditional tools used to monitor network traffic patterns, such as RMON and SNMP, have been based on a polling model where data is typically collected at one second or longer intervals. What about the events that will occur within these polling intervals? With the evolution to 100GbE attachment in the data center, within even a one-second interval a 100GbE interface could go from idle to forwarding over 280 million packets and back again. In a traditional SNMP/RMON polling model this 280 million packet burst can become invisible.

    Let’s look at the potential business impact of these microbursts in a High-Frequency Trading (HFT) environment:

    • In a NetworkingWorld article, Charles Thompson – NI manager of system engineering, stated “When trading floors open at 9:30 am Eastern time, their networks are flooded with a ridiculous number of trades that have been queued up since the night before. To analyze performance issues, network managers often have to break out a one-second period into smaller, microscopic intervals. So, they’ll chop up the one-second interval into 100-millisecond intervals, 10-millisecond intervals, or 5-microsecond intervals for investigations. When you get to a sub-second resolution, it’s referred to as a microburst. It’s a small period of time when a major burst of usage occurred…….But, we’ve had many customers requiring 100-microsecond increments, who will take advantage of this drill down capability.”
    • An InformationWeek article stated, “A 1-millisecond advantage in trading applications can be worth $100 million a year to a major brokerage firm.”

    Networks are critical to the business as they deliver applications and services to the rest of the organization. Networks must have high performance, low latency, reliability and security. Network/data center downtime is expensive and impacts the business outcome. By proactively detecting these elusive networks microburst allows the network operators to run their network at the most optimum performance level.

    c05347793.png

    Learn more about the HPE FlexFabric 5950 100G TOR (Top of the Rack) Switch. This switch will provide the capability of detecting these elusive microbursts by embedding BroadviewTM Instrumentation analytics in the switch.

    Related:

    How to provide credentials for calling Key Protect service within Bluemix application?

    To call [Key Protect service][1] API you need to provide Authorization, Bluemix-space and Bluemix-org headers. The Authorization header contains Bluemix access token. Such token can be obtained by calling cf oauth-token command (see [How to get OAuth token from CloudFoundry][2]).

    What I do not understand is:

    1. what is the default validity of such token in Bluemix?
    2. if I need to call Key Protect service from Blumeix (e.g. Liberty) application, I need to store somewhere the Authorization credentials in order to call the service. What is the best / suggested way to do that? Environment variable? User-provided service?

    [1]: https://console.ng.bluemix.net/docs/services/keymgmt/index.html
    [2]: http://stackoverflow.com/questions/27985469/how-to-get-oauth-token-from-cloudfoundry

    Related:

    Why are Real Time Network Analytics so important?

    By: Gautam Chanda, GPLM DC Networking, HPE

    In today’s world of hyper speed business decisions and need to be agile enough to stay ahead of the competition, match or exceed new market demands, and manage demanding customer expectations, your customer facing application behavior will succeed or fail based primarily on its performance.

    This puts a tremendous responsibility on Network Engineers (aka Network Operators) in Enterprises and HyperScale Data Centers as most of the unexplained application performance shortfalls result from an underlying network infrastructure not providing proper performance and scale on demand, or it was not adequately designed for optimum application performance. The Network Operators will have to ensure that their networks are responsive, “always on” and capable of meeting the ever-growing demand from applications they run. Providing operators with deeper instrumentation and telemetry data about the network help operators diagnose network issues, plan and fine-tune the network to provide improved performance and make optimal use of network resources.

    One of the main causes of these unexplained application performance issues stems from latency caused by underlying network congestion. Among the causes of a network, congestion is an elusive type of network congestion called “Microburst.” As the name suggest “microbursts” are sub-second periods of time when major bursts of network usage occur at line rate and can temporarily overflow the switch buffers and cause packet loss or backpressure.

    Traditionally “congestion” has been associated with switch ports being utilized at close to line rate. In a congestion scenario, packets can be dropped by the switch or flows may backpressure due to lack of buffer space. However, a more recent analysis has uncovered the existence of these “microbursts” occurring more frequently than we may have guessed and there was no good way to detect them, resulting in network engineers looking for the proverbial “needle in a haystack” to find the causes of unexpected application performance issues in their network.

    Typically, these “microbursts” do not last long enough to be detected by traditional switch counters such as SNMP or port statistics. This is because traditional tools used to monitor network traffic patterns, such as RMON and SNMP, have been based on a polling model where data is typically collected at one second or longer intervals. What about the events that will occur within these polling intervals? With the evolution to 100GbE attachment in the data center, within even a one-second interval a 100GbE interface could go from idle to forwarding over 280 million packets and back again. In a traditional SNMP/RMON polling model this 280 million packet burst can become invisible.

    Let’s look at the potential business impact of these microbursts in a High-Frequency Trading (HFT) environment:

    • In a NetworkingWorld article, Charles Thompson – NI manager of system engineering, stated “When trading floors open at 9:30 am Eastern time, their networks are flooded with a ridiculous number of trades that have been queued up since the night before. To analyze performance issues, network managers often have to break out a one-second period into smaller, microscopic intervals. So, they’ll chop up the one-second interval into 100-millisecond intervals, 10-millisecond intervals, or 5-microsecond intervals for investigations. When you get to a sub-second resolution, it’s referred to as a microburst. It’s a small period of time when a major burst of usage occurred…….But, we’ve had many customers requiring 100-microsecond increments, who will take advantage of this drill down capability.”
    • An InformationWeek article stated, “A 1-millisecond advantage in trading applications can be worth $100 million a year to a major brokerage firm.”

    Networks are critical to the business as they deliver applications and services to the rest of the organization. Networks must have high performance, low latency, reliability and security. Network/data center downtime is expensive and impacts the business outcome. By proactively detecting these elusive networks microburst allows the network operators to run their network at the most optimum performance level.

    Screen Shot 2016-09-28 at 10.06.38 AM.png

    Learn more about the HPE FlexFabric 5950 100G TOR (Top of the Rack) Switch. This switch will provide the capability of detecting these elusive microbursts by embedding BroadviewTM Instrumentation analytics in the switch.

    Related: