VMAX & Openstack Ocata: An Inside Look Pt. 9: Live Migration

Welcome back! In our series we have taken an in-depth look into all of the various features supported by VMAX in OpenStack Ocata. We have looked at everything from installation and configuration, snapshots to volume replication. Today we are going to look at live migration in OpenStack when a VMAX is the storage back end. Whilst no additional configuration is required from a VMAX perspective, and primarily it is a Nova compute operation, it is an interesting topic and is worth taking a better look at. I will not go into detail about configuring your environment for Live Migration as it differs from environment to environment, but I do recommend you take a look at the official OpenStack documentation for configuring live migrations.

What is Live Migration?

Live migration is the process whereby a virtual machine instance is moved to a different OpenStack compute host whilst the instance continues running causing no disruption. A typical use-case for migrating an instance is for planned maintenance on a compute server, but it may be also for resource load distribution when many virtual machines are running on a compute server.


Imagine a scenario using the illustration above. Hosts A, B & C are all running virtual machines, controlled by a cloud administrator (client). Host B over the course of a working day starts see resource contention and as a consequence the VMs running on the host start to slow down in terms of RAM and CPU allocation. To alleviate the resource contention, the cloud admin decides to move instances to host A & C and spread the load more evenly across their environment. Instead of shutting down the instances on host B, the cloud admin can migrate the contested instances to host A & C using live migration so there is no impact to the end user and no-one is aware there has been a change in the VM host location. If anything, the end-user will notice their services are running better than they did a few minuted before.

There are a few different methods of migrating your instances from one compute host to another in OpenStack, dependent on the requirements of the migration and the storage type used for the instance to be migrated. The different types of migration are:

  • Cold migration (non-live migration or simply migration) – In cold migrations the running instance is shut down and moved to another compute server and restarted when the transfer is complete. As the instance is shut down in this scenario, there is a disruption to any services it provided.
  • Live migration – During live migrations the instance keeps running throughout the entire process, which is especially useful if it is not possible to shut down the instance or have a disruption to running services. Live migrations can be classified further by how it handles the storage back end of the running instance:
    • Shared storage-based live migration – The instance has ephemeral disks that are located on storage shared between the source and destination hosts.
    • Block live migration – The instance has ephemeral disks that are not shared between the source and destination compute hosts.
    • Volume-backed live migration – The instance uses a volume rather than ephemeral disks for storage. The volume is located on storage shared between the source and destination hosts. This is the migration type used by OpenStack when VMAX is used as the storage back end of the instance.

Shared-storage and volume-backed live migrations are distinctly more advantageous than block migrations as they do not require disks to be copied between hosts. Block live migrations take considerably more time to complete and put additional load on the network.

How does Live Migration work?

When a live migration process is requested to move an instance to another compute node there are a number of steps required. Gaining a better insight into these allows for better understanding of the process itself and help determine if a live migration is going to be successful or not. The most important aspects of these checks will determine:

  • If the target host has enough resources available to support the instance when migration is complete
  • If the source and target host use shared storage to support moving the volume-backed instance between the hosts seamlessly

The steps carried out during each and every OpenStack live migration process are as follows:

  1. Pre-migration – Check memory, CPU, and storage availability on target host
  2. Reservation – Reserve the required instance resources on the target host and mount required disks
  3. Pre-copy – Copy instance memory from the source to target host
  4. Stop and copy – Pause instance on source host and copy dirty memory and CPU state
  5. Committent – Instance confirmed running on the target hosts
  6. Clean-up – Unmount disks to remove old connections to source host from storage, delete instance on source host

Performing Live Migrations

Migrating an instance from one compute host to another using OpenStack is very straightforward. Once you know the name of your target host that is pretty much it in terms of what you need to do in advance of the operation! The command nova list will show you the Nova compute hosts in your environment, it is up to you to make sure that the target host is accessible from the source host and back end storage device.

Nova list show.PNG.png

Note: There has been output from the nova show command removed as it is not relevant to the LM process, all we are interested in here is the host on which the instance resides. In this case the instance resides on a host ending in vm20.

Once you have selected your target host, issue the command to migrate your chosen instance to it.

Command Format:

$ nova live-migration <instance_id> <target_host_id>

Command Example:

$ nova live-migration 659012c5-bfe0-4264-86a3-18cecdf2bf5b ********vm10

Once the command has been issued you will not get any notification back that it has been successful when performing the operation from the CLI. To check if the live migration process was successful, the easiest thing to do is to check the status of the migrated instance instead, if the process was successful the instance should list the target host in the migration process as its current host.

Command Format:

$ nova show <instance_id>

Command Example:

$ nova show 659012c5-bfe0-4264-86a3-18cecdf2bf5b

nova live-migrate.PNG.png

From the example above, the ‘cirros_vm’ instance was live migrated to from vm20 to vm10 seamlessly without any interruption to connectivity or services.

Live Migration Gotchas

When performing live migrations in your environment there are a few important things which need to be taken into consideration.

There is very little (or at times non-existent) feedback on the live migration process to the user, if something goes wrong it is not always instantly obvious. An instance will just not move to the target host and keep running as if there was never a live migration process started in the first place. A good place to go look for potential issues with live migration is in your Nova & Cinder logs with debug mode enabled in their respective .conf files.

Depending on your environment setup, infrastructure capabilities, and a number of other environment variables at any given time, there may be on occasion a time where your live migration process times out. If this is the case, increase the value of ‘rpc_response_timeout‘ in all cinder and nova configuration files (cinder.conf & nova.conf) across all nodes until you find a value which is best suited to your environment. When you change this value (or any other value for that matter in any OpenStack configuration file) make sure you restart all respective services for the changes to take effect.

Your environment may have HAProxy configured to assist with load-balancing and high-availability nodes. If so, you will need to ensure that time settings in HAProxy do not interfere with operations in OpenStack. If you see errors in the logs which appear to be gateway timeouts, this usually points towards HAProxy being the culprit. To resolve this, increase the value of ‘timeout server‘ in ‘/etc/haproxy/haproxy.cfg‘, by default it is set to 60s, you could set this to the same value of ‘rpc_response_timeout‘ in the previous paragraph. Once you have changed this value, make sure to restart the HAProxy service using the command ‘crm resource restart p_haproxy‘ (confirmed on Ubuntu OS only).

It might seem obvious at this stage, but as I mentioned earlier, you need to make sure that your target host has sufficent resources available to support the instance to be migrated to it. If there are no resources available, or insufficent resources, the process will be cancelled during the pre-migration checks. To confirm you have enough available resources for your target host, use the command ‘nova hypervisor-show <hostname>’ to see what is available for use in a given host.

Hypervisor show host.PNG.png

Next time on ‘An Inside Look’….

Next time we return it won’t be to look at Cinder and block storage using VMAX, but instead the newly supported Manila! Manila is the file share service project for OpenStack. Manila provides the management of file shares for example, NFS and CIFS as a core service to OpenStack. Come back again to go over the ins and outs of Manila and VMAX, including an overview, configuration and usage, see you then!


Leave a Reply