7023339: Windows Replications Fail Due to VSS Errors

This document (7023339) is provided subject to the disclaimer at the end of this document.

Environment

PlateSpin Forge 11.x and up

PlateSpin Migrate 12.x and up

PlateSpin Protect 11.x and up

Situation

A replication of a Windows source workload fails with an error message related to VSS or BlockBasedVolumeWrapper.

Resolution

Ensure each drive on the source workload being replicated has at least 10% – 15% free space of the total volume size (Ex: if C: is 100 GB in size, there should be at least 10 GB – 15 GB of free space).
The service Volume Shadow Copy in Services on the source workload should have its startup type set to Manual. The Volume Shadow Copy service should not be disabled.
No other application on the source workload that uses VSS should be running at the time of the replication.
If there is high disk utilization during the time of the replication, the snapshot creation can run into an error. If there is high disk utilization, please schedule the replication to run during a time when the disk is not being used heavily.
In Windows Explorer on the source workload, right click on each drive and choose Configure Shadow Copies.
Look at the settings of each drive. Make sure each drive has the no limit option selected and that the drive selected to store the snapshots has the sufficient space, 10%-15% free space of the total volume size.
Remove all existing shadow copies of each drive on the source. This can be done through the diskshadow utility.
Open a command prompt by right clicking on its icon and choosing run as administrator and run this command.
diskshadow
The diskshadow command line utility will open. Run this command to remove all existing VSS snapshots.
delete shadows all
See this Microsoft document for more information about diskshadow.
Test the creation of shadow copies using “vssadmin create shadow” for each drive; these snapshots can be removed using diskshadow after the snapshots have been taken.
See this Microsoft link for more information about vssadmin.
Test the creation and removal of VSS snapshots using PlateSpin.Athens.SnapshotExecution.exe. See TID 7017929.

Cause

PlateSpin uses the native Windows service Volume Shadow Copy (VSS) to create snapshots of the volumes being replicated. If the snapshots cannot be taken, the job will not complete successfully.

Additional Information

Consult with the system administrator of the source workload before making any changes this TID recommends.

There may be errors related to the creation of VSS snapshots on the source workload’s Event Viewer application and system logs. When submitting a service request, please export those logs as .evtx files and upload them to the service request along with the diagnostics of the failed replication job.

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented “AS IS” WITHOUT WARRANTY OF ANY KIND.

Related:

Re: NAS Proxy functionality

Additionally, over SMB, snapshots can frequently be seen by using either:

a) the previous versions tab (from a windows box)

or

b) by changing your path from \proxyrepodoc to \proxyrepodoc.ckpt .Different NAS systems call this different things. VNX/Celerra was always .ckpt, Isilon + NetApp use .snapshot, and so-forth, however in some cases it’s hidden over SMB. Of course confirming that the snapshots are there in the first place is where I would start.

~Chris

Related:

Re: Hyper-V Backups and VSS Errors

Hello forums! This is my first post here, I wanted to check here before I open either a MS case or Networker case.

I am trying to backup Hyper-V VMs using Networker, and I followed the integration guide to a T. My version is 9.2.1, and HyperV is running on Server 2016. When I kick off a backup, I can see the checkpoint being created in HyperV manager, however the job subsequently fails and I get a VSS error in the Event log of the HyperV host machine. Link to the Networker server logs:

https://beallsinc-my.sharepoint.com/:u:/p/dsleichter/EUBhPkLmtM9Ftv3YGufqsTIB607XzZjQSrMwrWGD57d-xA?e=xBoJIi

Volume Shadow Copy Service error: Error calling a routine on a Shadow Copy Provider {89300202-3cec-4981-9171-19f59559e0f2}. Routine details PostCommitSnapshots({6ee1a43b-a821-49ef-aefa-a58e7228d44e}, 1) [hr = 0xffffffff].

Operation:

Executing Asynchronous Operation

Context:

Current State: DoSnapshotSet



Prior to the first backup run, vssadmin list writers shows the Hyper-V writer having no error / ready. After the backup runs and fails, the Hyper-V writer is in a Failed state, showing a timed out error:

Writer name: ‘Microsoft Hyper-V VSS Writer’

Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}

Writer Instance Id: {9df1e599-3ba8-4a3f-bfc6-2c6a03fc609c}

State: [9] Failed

Last error: Timed out

Related:

VNXe2: Snapshots are not available under Windows Previous Versions tab

Article Number: 482364 Article Version: 4 Article Type: Break Fix



VNXe1600,VNXe2 Series,VNXe3200

Snapshots are available under folder directory and under snapshots tab of the shared folder; however, snapshots are not available from the Windows Previous versions tab.

In order for snapshots to be available under the previous versions tab, the radio button “hidden” must be selected under the protection schedule at the time the snapshot is created.

User-added image

Selecting the “hidden” radio button after a snapshot has been created will not make that snapshot available in the Windows Previous Versions tab.

Related:

Recovering Exchange Data with NMM 9

This article will cover restore options for Exchange in NMM 9. The demo in the article is with NMM 9.2.1.3 and Exchange 2013 DAG. NMM 9 offers the choice of using ItemPoint to perform mailbox/item level recovery. ItemPoint is the preferred method of GLR in NMM 9. However traditional GLR (Granular Level Restore) using NMM GUI is available for Exchange 2013 and Exchange 2010. For Exchange 2016 ItemPoint is the only method of GLR .

If you choose to use ItemPoint for mailbox/mail item recovery, then traditional GLR from NMM GUI is not available. So you can either use ItemPoint or traditional GLR restore but not both on a given host. This choice is made during NMM 9 install. Additionally if there is a requirement to restore backups done with NMM 8, then during install of NMM 9, select the option ‘Restore of NMM 8.2.x and earlier backups(VSS workflow)’

Below is the screen shot of the NMM 9 install process that shows the selection to restore backups from previous version and selecting ‘Exchange Granular Recovery’ (ItemPoint) as the preferred method of GLR.

pic1.png

Note, when you select to ‘Restore of NMM 8.2.x and earlier backups(VSS workflow)’, the install process installs the binaries for NMM 8.2.4, that will be used for the restore. The binaries are stored in the default folder:

C:Program FilesEMC NetWorkernsrrpvnmm

RPVNMM stands for ‘Restore Previous Version of NMM’

There is also a selection added in the ‘EMC Networker’ Program Group, under ‘NetWorker Tools’ for the Restore GUI that will read older backups done with NMM 8.2.x. The program name is ‘Restore previous NMM release backups’

pic2.png

In addition, the install also contains the binary ‘nsrsnap_vss_ssrecover’ that allows for flat file restore of the NMM 8 backups, if required.

On a Exchange host where ItemPoint was chosen for GLR, there are 2 restore choices available. ‘Database Recovery’ and ‘Granular Level Recover’. Note there is no ‘RDB data recover’. If you are used to seeing this option with previous versions of NMM, then note that ItemPoint restore is not handled through RDB, hence the choice of ‘RDB data recover’ is not available.

1. Database Recovery

pic3.png

Database Recovery provides the following options:

a. Overwrite the Database, if there is a need to recover the database to itself (overwriting it). Select the desired database and then click ‘Recover’. You will see the message window below advising to set the property ‘This database can be overwritten by a Restore’ of the database.

pic4.png

This can be done either using the PowerShell command as mentioned in the message window or using ECP / EAC GUI as below:

pic5.png

****Note this is a destructive action. Make sure this is the correct choice before you proceed. NMM does not let you set the ‘This database can be overwritten by a restore’ property from the NMM GUI, to ensure there is no accidental overwrites.

Once the property ‘This database can be overwritten by a restore’ is set, then NMM GUI allows you to proceed with restore. Click ‘recover…’ and then ‘Recover options…’ Under ‘Exchange’ tab there are certain choices to make that will determine how the ‘recover’ is done within Exchange



pic6.png

Below is a brief explanation of these choices:

‘Include Existing logs (Roll-forward Recovery)’. This option is useful if the database and logs were on separate volumes and the volume containing logs is still available, or even if both DB and logs are on the same volume and only the ‘edb’ file is corrupt and the logs are good, then you can do the restore of the backup and then perform a roll forward recovery using the logs on the disk. This will bring the database to the most recent state with minimum or no data loss.

‘Include only logs from this restore (Point-in-time recovery)’. Select this option when point in time restore is required, i.e the database will be recovered to the time of the last backup.

‘Put database online after restore’. By default the restore process will replay the logs and put the database online after restore. If this is not required, then click on this option again to deselect it and select ‘Do not replay the transaction logs’. If ‘Do not replay the transaction logs’ is selected then the logs are restored, but they will need to be manually replayed using ‘eseutil’

‘Deleted Database Target’, This is used if a flat file restore of database is required. This option bypasses VSS method of restore and simply restored the ‘edb’ and ‘logs’ as files to the target directory. Further processing is required to mount the database.

For Database Restore of a database that’s replicated in a DAG configuration, you first have to suspend the replication, otherwise the following error is seen in the Monitor tab:

The client name used NMMHOST2. The Exchange Server version used is Exchange 2013.

145369:nsrnmmrc: Initialization success — Exchange shell successfully initialized. Required for Exchange.

MailboxStore [DB01] is in replicated state, please suspend the replication on all DAG nodes and perform restore after that.

Also note that this restore can only be performed on the node that holds the active copy of the database.

2. GLR restore with ItemPoint:

Before trying GLR restore with ItemPoint, some preparatory steps have to be taken to meet all pre-requisites:

  1. Install 32 bit Outlook 2010/2013 on the server from where the restore will be performed. Outlook 2016 is currently not supported.

Microsoft does not recommend (or support) Outlook Installation on Exchange Server. Prior to NMM 9.2, ItemPoint could only be used on the Exchange server, which means outlook had to be installed on the Exchange server. However NMM 9.2 and greater allows ItemPoint GLR from a non-Exchange server (Proxy).

2. Ensure the user performing restore has send-as, receive-as permissions on the mailbox server to allow for browsing and restoring the mailboxes. Also make this user a member of the ‘Exchange Organization Management’ security group.

Also ItemPoint needs additional permission assignments to the user performing restore to allow for browsing and recovering mail items from any mailbox

Get-mailboxdatabase -identity db01 | add-adpermission -user nmm_svc -accessrights genericall



Get-mailbox | add-mailboxpermission -user nmm_svc -AccessRights FullAccess -InheritanceType All



For further information on this topic refer to ItemPoint documentation.

To perform ItemPoint GLR from a non-Exchange server ensure the following pre-requisites are met:

  1. The non-Exchange server would need to have same OS as the production Exchange server. You may be able to get away using a different OS on this host. Using the same OS will eliminate any complications. Check ItemPoint documentation on support for different OS.
  2. Install and configure a 32-bit Outlook 2010 or 32-bit Outlook 2013. Note outlook and MAPI/CDO [Messaging API and Collaboration Data Objects] software cannot co-exist on a host. If MAPI/CDO software is installed, uninstall it before installing outlook
  3. Install Networker client 9.2.x & NMM 9.2.x. For this article I’m using NMM 9.2.1.3
  4. Create a client resource for this host on the Networker server.

pic7.png

5. Grant remote access permissions to the user performing restore from this host. There are 2 ways to do this. Either add this user to a user group that has remote access privilege or add this user to the ‘remote access’ attribute of the DAG client resource.

Method 1: Adding the user performing the restore to the ‘Remote Access’ attribute of the client resource.

pic8.png

Method 2: Add the user to a user group that has ‘remote access’ privilege.

pic9.png

Once the above pre-requisites have been met, you are ready to perform GLR using itempoint.

To start the recovery, from NMM restore host, launch the NMM GUI as ‘administrator’. Then select the client from the drop down and then select ‘Exchange Recover Session’ => ‘Granular Level Recover’

pic11.png

Note this restore is being performed from a host that does not have Exchange server installed and as such NMM provides only the GLR option for restore. Since Exchange is not installed, Database Recovery or Restore to RDB cannot be performed.

From the browse window, select the desired client, browse time and database for restore and right click:



pic12.png



From the right click menu either choose ‘mount backup’ or ‘mount backup and run ItemPoint’. If you select ‘Mount backup’ then NMM will mount the backup and then you have to manually launch ItemPoint to restore from the mount points. ‘Mount backup and run ItemPoint’ is preferable as NMM GUI mounts the backup and then starts ItemPoint and auto fills the path for edb and log files:

pic13.png

Click ‘Finish’ and ItemPoint starts processing the log files and edb file:

pic14.png





Once the log and edb file is processed, ItemPoint GUI shows the content of the source database that was mounted. The next step is to open a target. This can be an Exchange server or a PST file. Here we open a target Exchange server, as we want to restore to a mailbox to the target server.



pic15.png



When you click ‘Open Target Exchange server’, you get the following choices. In ‘Select Target’ either choose to connect to a single mailbox or ‘All Mailboxes’. You would choose ‘All Mailboxes’ if you want to restore data to different mailboxes. Click ‘Next’







pic16.png







ItemPoint shows the number of mailboxes in the Target. Review and click ‘Close’

pic17.png

pic18.png

This output shows copy of mailbox ‘nmm_svc’ to the mailbox ‘blee’

pic19.png

If you need to restore a mailbox or mailbox items to a PST file, right click the mailbox / mail items and select ‘Export…’

pic20.png

(Only for Exchange server 2010 and Exchange Server 2013. With Exchange Server 2016 ItemPoint is the only choice for GLR)

If ItemPoint will not be used for GLR, then you can fall back on traditional GLR with NMM GUI. The recovery process here is very similar to the GLR in NMM 8, except that NMM uses ‘block based backup’ mechanism to mount the backup. In NMM 8, NwFS (Networker virtual file system) was used to perform the virtual mount of the backup.

Following are the pre-requisites for successful GLR using NMM GUI.

  1. Install “Messaging API and Collaboration Data Objects 1.2.1” on the Exchange server where the restore will be performed. Version 6.5.8353 is recommended
  2. The service account used with NMM, should have ‘send-as, receive-as’ rights on the Exchange Server
  3. The service account used with NMM should have a mailbox located on a database that’s mounted on the same version of Exchange server. If the environment has both Exchange Server 2010 and Exchange Server 2013, then the service account mailbox should be on a Exchange 2013 server if the GLR is being performed for Exchange Server 2013. If the GLR is being performed for Exchange server 2010, then the service account mailbox should be on a Exchange Server 2010.
  4. The service account used with NMM should be a member of the Exchange Organization Management security group.
  5. If restore to PST is required, then the service account needs to have the additional ‘Mailbox Import Export’ role

Once the above pre-requisites are met, you are ready to perform GLR restore with NMM.

  1. Start the NMM GUI as ‘administrator’.
  2. Select the correct client from the ‘client’ drop down. Change ‘Recover Browse Time’ if required.
  3. Click ‘Recover’ -> Exchange 2013 Recover Session -> Granular Level Recover

pic21.png

4. Select the desired database. Then click ‘Recover…’

pic22.png

5. Click ‘Recover Options…’ if you want to modify the recovery behavior.

    1. Under General Tab, set the diagnostic level. This normally would be used by support to troubleshoot a restore failure

pic23.png

b. Under ‘Exchange’ tab below are the options. The defaults are fine for GLR restore.

pic24.png

6. Click start recover

pic25.png

7. Once the restore completes, the following message window is seen:



pic26.png

Review the monitor window for logging of the restore:

pic27.png

8. Click on the ‘recover’ tab to browse the GLR database and expand to the desired mailbox.

pic28.png

9. Select the mailbox or mailbox items

pic29.png

10. If recovering the mailbox/mailbox items back to the original mailbox choose ‘Recover..’ If recovering to another mailbox choose ‘Advanced Recover..’

Here I’m recovering to the original mailbox, so I choose ‘Recover..’. The ‘monitor’ window shows the messages for the restore.

pic31.png

When the recover completes, you can login to the mailbox to confirm the recover was successful. The recovered items are placed under a folder ‘Recovered Items …’ as seen below

pic32.png

This completes GLR using NMM GUI.

Recovery from backups done with previous version

If there is a requirement to restore from backups done with NMM 8.2.x, and during install the choice ‘Restore of NMM 8.2.x and earlier backups(VSS workflow)’ was made, then NMM will install the software required to perform this restore.

Note this software is installed in “C:Program FilesEMC NetWorkernsrrpvnmm”.

NMM 9 GUI cannot be used to recover backups performed with NMM 8. To launch the NMM 8 GUI, from program group ‘EMC NetWorker -> NetWorker Tools -> ‘Restore previous NMM release backups’. Start this GUI as ‘administrator’

pic33.png

Check recovering Exchange backups in NMM 8 for details on recovery with NMM 8.

Related:

Local Replication improvements in the New Unisphere for PowerMax 9.0

Local Replication improvements in the New Unisphere for PowerMax 9.0



Since Unisphere 9.0 has been totally overhauled it’s worth taking a few minutes to explore the new interface for snapshot management with SnapVX, our local replication solution for VMAX All Flash and PowerMax Systems. Full details of SnapVX and how it all works is detailed in the excellent technical note from my colleague Mike Bresnahan, linked here https://www.emc.com/collateral/technical-documentation/h13697-emc-vmax3-local-replication.pdf.

Once you’ve provisioned storage on your VMAX system for your applications, if the data is important, and if it’s living on VMAX or PowerMax it usually is, we want to make sure we have protection in place for recovery from a variety of scenarios, as well as making copies of data available for development or test scenarios. SnapVX snapshots provide this capability and Unisphere makes it really easy to do.

From the Storage Group list view in Unisphere, search for your storage group and click on it to select, and click Protect.

1.png

From the Protection wizard select Point in Time Using SnapVX and click NEXT.

2.png

Give your snapshot a name, in this Case I’m calling mine HourlySnap – you can be more descriptive if you like up to 64 Characters (avoid special characters) note if there are already snapshots you can create a new version using the existing snapshot name. You can also set the snapshot to autoexpire using the Time To Live value which can be either hours or days. I’ve set my retention to be 6 hours, so I will have this version of my snapshot for at least that amount of time, meaning I have a 6 hour Recovery Time objective (RTO).

4.png

When I Click Next I get a summary screen which enables me to be able to set a schedule, this is a Creating the recurring snapshot from within this window (or wizard or at same time as the snapshot) provides and improved and (more efficient?) workflow over previous versions of Unisphere

5.png

I can select a date and time for the first run of the job.

6.png

The Summary has now been updated and I can see my schedule, I also have the option to make changes before adding to the job list to be executed.

Once the job is added to the Joblist it should appear on the jobs list alert at the top of the screen.

7.png

Managing existing snapshots is done from the Data Protection Dashboard, as shown below.

8.png

I’ve selected the CriticalDatabase storage group. You can note the number of snapshots, and the creation time of the last one. In order to make this snapshot available to a host click on the Link button and the link snapshot wizard is launched

9.png

The link snapshot wizard allows the user to select the snapshot, and version of a snapshot highlighted. Note the latest version is always (0).

When the Wizard runs, if you choose to create a new storage group for the linked target, the wizard automatically creates the required devices and completes the link. This is evident from the details on the output below.

10.png

To finish the process and make the snapshot visible on the host the new storage group needs to be part of a masking view, from the Storage Groups list simply select your storage group, click provision select the host and click Next>Next>Run Now couldn’t be simpler.

11a.png

While we have made it extremely simple to take snapshots, it’s worth noting that you should monitor your array for snapshot usage. We are space efficient in how we share tracks between snapshots and production but space is not an infinite resource, it’s good to keep an eye on things. The capacity dashboard makes this easy to do, and Storage Group Demand reports allow you to see which applications are consuming the greatest amount of space and if that space is from snapshots.

12.png

It’s also highly recommended that alerting be enabled so that you are proactively alerted on capacity or resource issues so that action can be taken, the settings are accessed from the little cog on the top right and you can enable the threshold alerts as shown below.

13.png

Hopefully this post has given you a good feel for the updated interfaces, be sure to have a read of our technote https://www.emc.com/collateral/technical-documentation/h13697-emc-vmax3-local-replication.pdf and the release notes for PowerMax OS 5978 for full details on microcode enhancements for working with SnapVX.

More Posts to follow.

Related:

Re: Inconsistent Shadow Copy with System Writer

Hey guys,

I’m having a problem with one Windows 2012 R2 client running on VMware, backing up with Networker 8.2.3. I’m hoping you guys will be able to straighten me out, I’m about ready to format the server and start over. Here’s the facts and what I’ve done so far.

1) Windows server backup is successful doing a full bare metal backup, therefore I don’t believe VSS is the culprit.

2) Networker reports error:

5388:save: Failure status of writer System Writer – VSS_WS_FAILED_AT_PREPARE_SNAPSHOT

VSS OTHER: ERROR: VSS failed to process snapshot: The shadow-copy set only contains only a subset of the

volumes needed to correctly backup the selected components of the writer. (VSS error 0x800423f0)

90108:save: Unable to save the SYSTEM STATE save sets: cannot create the snapshot.

86024:save: Error occured while saving disaster recovery save sets.

3) After resetting all VSS Writers, rebooting, and performing backup, SYSTEM WRITER reports error:

Writer name: ‘System Writer’

Writer Id: {e8132975-6f93-4464-a53e-1050253ae220}

Writer Instance Id: {b0094e31-feec-4dec-b372-1d517c14c44b}

State: [8] Failed

Last error: Inconsistent shadow copy

4) No other writers are in error after a Networker backup

5) VSSTRACE reports error during Networker backup

[14:38:33.513 P:03F0 T:044C WRTWRTIC(2600) WRITER] Writer System Writer exposing state<8> failure<0x800423f0>

6) Windows 2012 R2 client is fully patched

7) I’ve tried assigning a drive letter to the system partition on this machine, that did not fix the problem.

8) I tried VSSADMIN DELETE SHADOWS /ALL. This did not fix the issue.

9) I removed the Networker client, deleted all associated directories, and re-installed. This did not fix the issue.

Does anyone have any ideas on what else I can try to resolve this issue before I re-install Windows? I’ve been through about 200 tech articles and EMC user technical questions in this community.

Any help would be appreciated.

Thanks

Joel

Related:

Snapshots – Part 2

In the previous article, we looked at the OneFS SnapshotIQ architecture and how snaps are created and deleted. Now, we’ll explore the options available to recover data from a snapshot:



There are four main methods for restoring snapshot data, namely:



  • Copying specific files and directories directly from the snapshot
  • Cloning file(s) from the snapshot
  • Replicating a snapshot using SyncIQ
  • Reverting the entire snapshot via the SnapRevert job



The appropriate option to use will depend on criteria such as the quantity of data you’re looking to recover, whether the snapshot is to be restored in place, what data services are licensed, etc. As such, a decision tree for these choices might look like:

snapshot_restore_1.png

Here’s a bit more detail on each for the four options above:



1. Copying a file from a snapshot duplicates that file, which roughly doubles the amount of storage space it consumes. Even if the original file is deleted from HEAD, the copy of that file will remain in the snapshot.



2. Cloning a file (cp –c) from a snapshot also duplicates that file. Unlike a copy, however, a clone does not consume any additional space on the cluster – unless either the original file or clone is modified. This makes it an efficient option for large files.



3. If SyncIQ is licensed on the cluster, it can be used in its local target mode to replicate out of a snapshot. This is a very efficient way to recover large amounts of data from within snapshots. This involves first creating a SyncIQ policy with the ‘source root directory’ set to the snapshot’s path. Secondly, run a SyncIQ replication job from the command line, with the syntax of the form:



# “isi sync job start –policy-name=<name-of-synciq-policy> –source-snapshot=<source-snapshot-name>



It’s worth noting that the SyncIQ ‘–source-snapshot’ option is only available from the command line interface.



4. Finally, arguably the most efficient of these approaches is the SnapRevert job, which automates the restoration of an entire snapshot to its top level directory. However, this will overwrite the existing HEAD dataset.



SnapRevert allows for quickly reverting to a previous, known-good recovery point – for example in the event of virus outbreak. The SnapRevert job can be run from the Job Engine WebUI, and requires adding the desired snapshot ID.



The SnapRevert job automatically restores an entire snapshot to its top level directory, overwrite the existing HEAD dataset. SnapRevert allows for quickly reverting to a previous, known-good recovery point – for example in the event of a malware outbreak.



Under the hood, SnapRevert has two main components:



  • The file system domain that the objects are put into.
  • The job that reverts everything back to what’s in a snapshot.



SnapRevert is built on the concept of a domain. In OneFS, a domain defines a set of behaviors for a collection of files under a specified directory tree. More specifically, SnapRevert is a ‘restricted writer’ domain. This means it possess an extra bit of filesystem metadata and associated locking code which prevents the domain’s files being written to while restoring a last known good snapshot.



Where possible, the preferred practice is to create the domain before there is data. This avoids having to wait for the DomainMark job to walk the entire tree, setting that attribute on every file and directory within it.



Existing configured SnapRevert domains can be viewed with the following command:



# isi_classic domain list -l



The SnapRevert job itself actually uses a local SyncIQ policy to copy data out of the snapshot, discarding any changes to the original directory. When the SnapRevert job completes, the original data is left in the directory tree. In other words, after the job completes, the file system (HEAD) is exactly as it was at the point in time that the snapshot was taken. The LINs for the files/directories don’t change, because what’s there is not a copy.



The SnapRevert job can be manually run from the OneFS WebUI by navigating to Cluster Management > Job Operations > Job Types > SnapRevert and clicking the ‘Start Job’ button:



snaprevert_4.png



Before a snapshot is reverted, SnapshotIQ creates a point-in-time copy of the data that is being replaced. This enables the snapshot revert to be undone later, if necessary.



Additionally, individual files, rather than entire snapshots, can also be restored in place using the isi_file_revert command line utility. This can help drastically simplify virtual machine management and recovery.



Before creating snapshots, it’s that reverting a snapshot requires that a SnapRevert domain exist for the directory that is being reverted. If you intend on reverting snapshots for a directory, it is recommended that you create SnapRevert domains for those directories while the directories are empty. Creating a domain for an empty (or sparsely populated) directory takes considerably less time.



How do domains work?



Files may belong to multiple domains. Each file stores a set of domain IDs indicating which domain they belong to in their inode’s extended attributes table. Files inherit this set of domain IDs from their parent directories when they are created or moved. The domain IDs refer to domain settings themselves, which are stored in a separate system B-tree. These B-tree entries describe the type of the domain (flags), and various other attributes.

As mentioned, a Restricted-Write domain prevents writes to any files except by threads that are granted permission to do so. A SnapRevert domain that does not currently enforce Restricted-Write shows up as “(Writable)” in the CLI domain listing.



Occasionally, a domain will be marked as “(Incomplete)”. This means that the domain will not enforce its specified behavior. A domain is incomplete if any of the files it contains are not marked as being a member of that domain. Since each file contains a list of domains of which it is a member, the list must be kept up to date for each file. The domain is incomplete until each file’s domain list is correct.

In addition to SnapRevert, OneFS also currently uses domains for SyncIQ replication and SnapLock immutable archiving.



Creating a SnapRevert domain

A SnapRevert domain needs to be created on a directory before it can be reverted to a particular point in time snapshot. As mentioned before, the recommendation is to create SnapRevert domains for a directory while the directory is empty.

The root path of the SnapRevert domain must be the same root path of the snapshot. For example, a domain with a root path of /ifs/data/big-dircannot be used to revert a snapshot with a root path of /ifs/data/big-dir/archive.

For example, for snaphsot DailyBackup_03-26-2018_12:00which is rooted at/ifs/data/big-dir/archive:

1. First, set the SnapRevert domain by running the DomainMark job (which marks all the files):

# isi job jobs start domainmark –root /ifs/data/big-dir –dm-type SnapRevert

2. Verify that the domain has been created:

# isi_classic domain list –l

Reverting a snapshot

In order to restore a directory back to the state it was in at the point in time when a snapshot was taken, you need to:

  • Create a SnapRevert domain for the directory.
  • Create a snapshot of a directory.

To do this:

1. First, identify the ID of the snapshot you want to revert by running the isi

snapshot snapshots viewcommand and picking your PIT (point in time).

For example:

# isi snapshot snapshots view DailyBackup_03-26-2018_12:00

ID: 82

Name: DailyBackup_03-26-2018_12:00

Path: /ifs/data/big-dir

Has Locks: No

Schedule: daily

Alias: –

Created: 2018-03-26T12:00:05

Expires: 2018-04-26T12:00:00

Size: 0b

Shadow Bytes: 0b

% Reserve: 0.00%

% Filesystem: 0.00%

State: active

2. Revert to a snapshot by running the isi job jobs startcommand. The following command reverts to snapshot ID 82 named DailyBackup_03-26-2018_12:00:

# isi job jobs start snaprevert –snapid 82

This can also be done from the WebUI, by navigating to Cluster Management > Job Operations > Job Types > SnapRevert and clicking the ‘Start Job’ button.

snaprevert_3.png

Deleting a SnapRevert domain

SnapRevert domains can be deleted using the job engine CLI:

1. Run thefollowing command to delete the SnapRevert domain – in this example of for /ifs/data/big-dir:

# isi job jobs start domainmark –root /ifs/data/big-dir –dm-type SnapRevert –delete

2. Verify that the domain has been deleted:

# isi_classic domain list –l

User Driven File Recovery

With deleted the appropriate access credentials and permissions, NFS and SMB users can view and recover data from OneFS snapshots. The snapshots are accessed via the .snapshot directory, as described previously in this paper. The following screenshot from a Windows client shows the list of snapshots available on a OneFS SMB share:

snaps_8.png

In the example below, a user accidentally deletes a file ‘/ifs/data/foo/bar.txt’at 9.10am and notices it’s gone a couple of minutes later. By accessing the 9am snapshot, the user is able to recover the deleted file themselves at 9.14am, by copying it directly from the snapshot directory ‘/ifs/data/foo/.snapshot./0900_snap/bar.txt back to its original location at ‘/ifs/data/foo/bar.txt’.

snap_restore3.jpg

SnapshotIQ integration with Windows Volume Snapshot Manager provides Windows users with a simple way of restoring data on an Isilon SMB share from the “Previous Versions” tab on their desktop.

Related:

Re: Question about Windows processes involved in Avamar backup

Working with a customer on backups for a large Windows file server, and am trying to get an idea of performance, so I have the Windows Resource Monitor open and I’m looking at disk operations.

While I see the expected entries for avtar under both “Processes with Disk Activity” and “Disk Activity”, I am also seeing what appear to be “associated” operations of some kind for “System” with a PID of “4” – and under the “Disk Activity” section, the System entries seem to be working on the same “DeviceHarddiskVolumeShadowCopy…” entries that the avtar processes are also working on.



I understand that the Windows backup uses VSS these days and that Avamar would likely “hit” that shadow copy – I also understand that Windows will likely be working on that shadow copy as well to a degree. What I curious about is that at times, the System processes seem to be a lot more busy with the shadow copy than avtar is – and while this system is being used (as a file server), I wouldn’t expect it to be “hit” as much by users as I would expect it to be “hit” by Avamar during a backup.



(and I also understand that System PID 4 is somewhat of a “catch all” process that takes care of a lot of things – but I still can’t figure out why at some points it would be showing 4x or more the number of I/Os that avtar would)



Can anyone provide some additional perspective on why there are so many System processes “hitting” the shadow copy as well as the avtar ones?



All comments/feedback appreciated – thanks.

Related:

Question about Windows processes involved in Avamar backup

Working with a customer on backups for a large Windows file server, and am trying to get an idea of performance, so I have the Windows Resource Monitor open and I’m looking at disk operations.

While I see the expected entries for avtar under both “Processes with Disk Activity” and “Disk Activity”, I am also seeing what appear to be “associated” operations of some kind for “System” with a PID of “4” – and under the “Disk Activity” section, the System entries seem to be working on the same “DeviceHarddiskVolumeShadowCopy…” entries that the avtar processes are also working on.



I understand that the Windows backup uses VSS these days and that Avamar would likely “hit” that shadow copy – I also understand that Windows will likely be working on that shadow copy as well to a degree. What I curious about is that at times, the System processes seem to be a lot more busy with the shadow copy than avtar is – and while this system is being used (as a file server), I wouldn’t expect it to be “hit” as much by users as I would expect it to be “hit” by Avamar during a backup.



(and I also understand that System PID 4 is somewhat of a “catch all” process that takes care of a lot of things – but I still can’t figure out why at some points it would be showing 4x or more the number of I/Os that avtar would)



Can anyone provide some additional perspective on why there are so many System processes “hitting” the shadow copy as well as the avtar ones?



All comments/feedback appreciated – thanks.

Related: