In the previous article, we looked at the OneFS SnapshotIQ architecture and how snaps are created and deleted. Now, we’ll explore the options available to recover data from a snapshot:
There are four main methods for restoring snapshot data, namely:
- Copying specific files and directories directly from the snapshot
- Cloning file(s) from the snapshot
- Replicating a snapshot using SyncIQ
- Reverting the entire snapshot via the SnapRevert job
The appropriate option to use will depend on criteria such as the quantity of data you’re looking to recover, whether the snapshot is to be restored in place, what data services are licensed, etc. As such, a decision tree for these choices might look like:
Here’s a bit more detail on each for the four options above:
1. Copying a file from a snapshot duplicates that file, which roughly doubles the amount of storage space it consumes. Even if the original file is deleted from HEAD, the copy of that file will remain in the snapshot.
2. Cloning a file (cp –c) from a snapshot also duplicates that file. Unlike a copy, however, a clone does not consume any additional space on the cluster – unless either the original file or clone is modified. This makes it an efficient option for large files.
3. If SyncIQ is licensed on the cluster, it can be used in its local target mode to replicate out of a snapshot. This is a very efficient way to recover large amounts of data from within snapshots. This involves first creating a SyncIQ policy with the ‘source root directory’ set to the snapshot’s path. Secondly, run a SyncIQ replication job from the command line, with the syntax of the form:
# “isi sync job start –policy-name=<name-of-synciq-policy> –source-snapshot=<source-snapshot-name>
It’s worth noting that the SyncIQ ‘–source-snapshot’ option is only available from the command line interface.
4. Finally, arguably the most efficient of these approaches is the SnapRevert job, which automates the restoration of an entire snapshot to its top level directory. However, this will overwrite the existing HEAD dataset.
SnapRevert allows for quickly reverting to a previous, known-good recovery point – for example in the event of virus outbreak. The SnapRevert job can be run from the Job Engine WebUI, and requires adding the desired snapshot ID.
The SnapRevert job automatically restores an entire snapshot to its top level directory, overwrite the existing HEAD dataset. SnapRevert allows for quickly reverting to a previous, known-good recovery point – for example in the event of a malware outbreak.
Under the hood, SnapRevert has two main components:
- The file system domain that the objects are put into.
- The job that reverts everything back to what’s in a snapshot.
SnapRevert is built on the concept of a domain. In OneFS, a domain defines a set of behaviors for a collection of files under a specified directory tree. More specifically, SnapRevert is a ‘restricted writer’ domain. This means it possess an extra bit of filesystem metadata and associated locking code which prevents the domain’s files being written to while restoring a last known good snapshot.
Where possible, the preferred practice is to create the domain before there is data. This avoids having to wait for the DomainMark job to walk the entire tree, setting that attribute on every file and directory within it.
Existing configured SnapRevert domains can be viewed with the following command:
# isi_classic domain list -l
The SnapRevert job itself actually uses a local SyncIQ policy to copy data out of the snapshot, discarding any changes to the original directory. When the SnapRevert job completes, the original data is left in the directory tree. In other words, after the job completes, the file system (HEAD) is exactly as it was at the point in time that the snapshot was taken. The LINs for the files/directories don’t change, because what’s there is not a copy.
The SnapRevert job can be manually run from the OneFS WebUI by navigating to Cluster Management > Job Operations > Job Types > SnapRevert and clicking the ‘Start Job’ button:
Before a snapshot is reverted, SnapshotIQ creates a point-in-time copy of the data that is being replaced. This enables the snapshot revert to be undone later, if necessary.
Additionally, individual files, rather than entire snapshots, can also be restored in place using the isi_file_revert command line utility. This can help drastically simplify virtual machine management and recovery.
Before creating snapshots, it’s that reverting a snapshot requires that a SnapRevert domain exist for the directory that is being reverted. If you intend on reverting snapshots for a directory, it is recommended that you create SnapRevert domains for those directories while the directories are empty. Creating a domain for an empty (or sparsely populated) directory takes considerably less time.
How do domains work?
Files may belong to multiple domains. Each file stores a set of domain IDs indicating which domain they belong to in their inode’s extended attributes table. Files inherit this set of domain IDs from their parent directories when they are created or moved. The domain IDs refer to domain settings themselves, which are stored in a separate system B-tree. These B-tree entries describe the type of the domain (flags), and various other attributes.
As mentioned, a Restricted-Write domain prevents writes to any files except by threads that are granted permission to do so. A SnapRevert domain that does not currently enforce Restricted-Write shows up as “(Writable)” in the CLI domain listing.
Occasionally, a domain will be marked as “(Incomplete)”. This means that the domain will not enforce its specified behavior. A domain is incomplete if any of the files it contains are not marked as being a member of that domain. Since each file contains a list of domains of which it is a member, the list must be kept up to date for each file. The domain is incomplete until each file’s domain list is correct.
In addition to SnapRevert, OneFS also currently uses domains for SyncIQ replication and SnapLock immutable archiving.
Creating a SnapRevert domain
A SnapRevert domain needs to be created on a directory before it can be reverted to a particular point in time snapshot. As mentioned before, the recommendation is to create SnapRevert domains for a directory while the directory is empty.
The root path of the SnapRevert domain must be the same root path of the snapshot. For example, a domain with a root path of /ifs/data/big-dircannot be used to revert a snapshot with a root path of /ifs/data/big-dir/archive.
For example, for snaphsot DailyBackup_03-26-2018_12:00which is rooted at/ifs/data/big-dir/archive:
1. First, set the SnapRevert domain by running the DomainMark job (which marks all the files):
# isi job jobs start domainmark –root /ifs/data/big-dir –dm-type SnapRevert
2. Verify that the domain has been created:
# isi_classic domain list –l
Reverting a snapshot
In order to restore a directory back to the state it was in at the point in time when a snapshot was taken, you need to:
- Create a SnapRevert domain for the directory.
- Create a snapshot of a directory.
To do this:
1. First, identify the ID of the snapshot you want to revert by running the isi
snapshot snapshots viewcommand and picking your PIT (point in time).
# isi snapshot snapshots view DailyBackup_03-26-2018_12:00
Has Locks: No
Shadow Bytes: 0b
% Reserve: 0.00%
% Filesystem: 0.00%
2. Revert to a snapshot by running the isi job jobs startcommand. The following command reverts to snapshot ID 82 named DailyBackup_03-26-2018_12:00:
# isi job jobs start snaprevert –snapid 82
This can also be done from the WebUI, by navigating to Cluster Management > Job Operations > Job Types > SnapRevert and clicking the ‘Start Job’ button.
Deleting a SnapRevert domain
SnapRevert domains can be deleted using the job engine CLI:
1. Run thefollowing command to delete the SnapRevert domain – in this example of for /ifs/data/big-dir:
# isi job jobs start domainmark –root /ifs/data/big-dir –dm-type SnapRevert –delete
2. Verify that the domain has been deleted:
# isi_classic domain list –l
User Driven File Recovery
With deleted the appropriate access credentials and permissions, NFS and SMB users can view and recover data from OneFS snapshots. The snapshots are accessed via the .snapshot directory, as described previously in this paper. The following screenshot from a Windows client shows the list of snapshots available on a OneFS SMB share:
In the example below, a user accidentally deletes a file ‘/ifs/data/foo/bar.txt’at 9.10am and notices it’s gone a couple of minutes later. By accessing the 9am snapshot, the user is able to recover the deleted file themselves at 9.14am, by copying it directly from the snapshot directory ‘/ifs/data/foo/.snapshot./0900_snap/bar.txt’ back to its original location at ‘/ifs/data/foo/bar.txt’.
SnapshotIQ integration with Windows Volume Snapshot Manager provides Windows users with a simple way of restoring data on an Isilon SMB share from the “Previous Versions” tab on their desktop.