7002659: How to progress stuck obituaries


3 – Report sync status needs to be error-free

Below an example of a Report Synchronization status. (for Linux use ndsrepair -E)

Collecting replica synchronization status

Start: Wednesday, October 15, 2008 13:35:11 Local Time

Retrieve replica status

Partition: .[Root].

Replica on server: .doublevision.servers.novell

Replica: .doublevision.servers.novell 10-15-2008 13:34:21

Replica on server: .sled-vh1.servers.novell

Replica: .sled-vh1.servers.novell 10-15-2008 13:34:22

Replica on server: .linx-vh1.servers.novell

Replica: .linx-vh1.servers.novell 10-15-2008 13:34:21

All servers synchronized up to time: 10-15-2008 13:34:21

Finish: Wednesday, October 15, 2008 13:35:11 Local Time

Total errors: 0

For the partition that is being checked, total number of errors must be 0

If errors are listed it means that the synchronization process cannot finish, which means that no obituary processing can take place.

Obituary processing can only start if the synchronization process has successfully finished without errors.

(in a dstrace an “all processed = Yes” is visible for the partition if synchronization for that partition is successfull)


4 – All servers in the replica ring must show sync’ed up within one hour from current time

The start time can be seen at the beginning of the log. Compare the start time to the time indicated for each replica.

If a server is not listed to have errors but for example has a time listed far more than 1 hour or even days ago it may need a restart of eDirectory. (on Linux, as root, type rcndsd restart. On netware unload ds and load ds. On windows stop ds.dlm and start it again. On Solaris type /etc/init.d/ndsd stop and /etc/init.d/ndsd start)


5 – All servers in the Tree must be reachable, up and running

Any Server in the Tree could potentially need to be contacted for the obituary process.

Reason for this is that when a client logs into a server and requests information for a particular object the server does not have a replica for, the server will look up (treewalk) the information on a server that does have the replica, and create an external reference object in it’s own database.

The external reference is basically an empty object that points to the server that has the real object, so next time the information is requested the external reference object holds a pointer to the server that needs to be contacted for the information and no treewalking will be needed.

The external reference object will also cause a backlink attribute to be created on the object itself on the replica to keep track of servers that know about the object.

When the object is moved or deleted the backlink attribute is used to make sure servers that do not have a replica will also know what to do with the external reference object. This is done by the obituary process.

6 – Gather the external reference log using dsrepair/ndsrepair

Netware: Load dsrepair -a ->advanced options menu ->check external references

The dsrepair.log will be located in sys:system

Windows: From ndscons load the dsrepair.dlm with”-a” in the startup parameter line -> Repair ->Check External References

The dsrepair.log will be located in c:NovellNDSDIBFiles

Linux: as root type:ndsrepair -C -Ad -A

The default location for ndsrepair.log will be in /var/opt/novell/eDirectory/log/

(if the default is not used the n4u.server.vardir variable will show the location if you type: ndsconfig get)


7 – Checking what partition(s) should be looked at

Below is a piece of an external reference log.

The line that starts with “Found obituary” indicates what object has got the obituary, in

this case it’s CN=upuser.OU=test.O=novell.T=NOVELLWS

Looking at the path should reveal what partition this object would belong to.

For example if ou=test is a partition CN=upuser would belong to that partition.

(1) Found obituary for: EID: 0000a798, DN: CN=upuser.OU=test.O=novell.T=NOVELLWS

Value CTS : 10-21-2008 11:56:38 R = 0001 E = 0001

Value MTS = 10-21-2008 11:56:38 R = 0001 E = 0001, Type = 0001 DEAD,

Flags = 0000

8 – Checking backlink obituaries for problems

Below is an example of a external reference log.

A backlink obituary can be identified by the following :”Type = 0006 BACKLINK”

If backlink obituaries are the cause for the obituaries not progressing it is likey the same can be seen as in our example below.

The Flags are the steps through which the process needs to go (0000, 0001, 0002 and 0004)

Check the backlink obituaries that belong to the same object (tip look for same EID number) and find one that is a step behind (flags = )compared to the other backlink obituaries, it is possible that that one can not be contacted or is not correctly backlinked.

Find the server that belongs to that backlink obituary from the log. It will be listed just below.

In this example we see that there is one backlink obituary that is still at flags = 0000 while the other backlink obituary is already at flags = 0001

We can see in the example that the baclink obituary that is not going forward points to server CN=doublevision.OU=servers.O=novell.T=NOVELLWS

Possible causes are :

1) The server is physically no longer in use (fix: remove it’s NCP Serverobject and this will clean up the backlinks that point to that server)

2) The server is experiencing a problem and may need a restart of eDirectory or even the server itself

3) The backlink is no longer valid on the server that has the external reference object (eg. may be pointing to wrong server)

In this case a “-xk3” repair would be required on the server that holds the external reference object in order for it to verify and correct any wrong backlinks it may have.

Netware: Load dsrepair -XK3 ->Advanced options menu ->Repair Local DS database -> F10 to start the repair

When done on the console type: set dstrace=*b to start the backlink process (give it some time to finish)

Linux: as root type: ndsrepair -R -Ad -XK3

When done type: ndstrace

A screen appears and in this you can type “set dstrace=*b”

To exit type “exit” (give it some time to finish)

Windows: From ndscons load the dsrepair.dlm with “-xk3” in the startup parameter line -> Repair -> Local Database Repair… click repair.

When done from ndscons highlight the ds.dlm and click on configure -> Triggers -> backlinker

(give it some time to finish)

Example:

Repair utility for Novell eDirectory 8.8 – 8.8 SP2 v20213.08

DS Version 20216.62 Tree name: NOVELLWS

Server name: .linx-vh1.servers.novell

Size of /var/opt/novell/eDirectory/log/ndsrepair.log = 34420 bytes.

Preparing Log File “/var/opt/novell/eDirectory/log/ndsrepair.log”

Please Wait…

External Reference Check

External Reference Check

Start: Tuesday, October 21, 2008 11:58:38 Local Time

(1) Found obituary for: EID: 0000a798, DN: CN=upuser.OU=test.O=novell.T=NOVELLWS

Value CTS : 10-21-2008 11:56:38 R = 0001 E = 0001

Value MTS = 10-21-2008 11:56:38 R = 0001 E = 0001, Type = 0001 DEAD,

Flags = 0000


(2) Found obituary for: EID: 0000a798, DN: CN=upuser.OU=test.O=novell.T=NOVELLWS

Value CTS : 10-21-2008 11:56:38 R = 0001 E = 0002

Value MTS = 10-21-2008 11:57:57 R = 0001 E = 0003, Type = 0006 BACKLINK,

Flags = 0001

NOTIFIED

Backlink: Type = 00000001 DEAD, RemoteID = ffffffff,

ServerID = 00008043, CN=sled-vh1.OU=servers.O=novell.T=NOVELLWS

(3) Found obituary for: EID: 0000a798, DN: CN=upuser.OU=test.O=novell.T=NOVELLWS

Value CTS : 10-21-2008 11:56:38 R = 0001 E = 0003

Value MTS = 10-21-2008 11:57:57 R = 0001 E = 0004, Type = 0006 BACKLINK,

Flags = 0000

Backlink: Type = 00000001 DEAD, RemoteID = ffffffff,

ServerID = 0000807d, CN=doublevision.OU=servers.O=novell.T=NOVELLWS

(4) Found obituary for: EID: 0000a798, DN: CN=upuser.OU=test.O=novell.T=NOVELLWS

Value CTS : 10-21-2008 11:56:38 R = 0001 E = 0004

Value MTS = 10-21-2008 11:57:57 R = 0002 E = 0001, Type = 000c USED_BY,

Flags = 0002

OK_TO_PURGE

Used by: Resource type = 00000000, Event type = 00000003, Resource ID = 00008026, T=NOVELLWS

Checked 0 external references

Found: 4 total obituaries in this DIB,

2 Unprocessed obits, 0 Purgeable obits,

1 OK_To_Purge obits, 1 Notified obits

Total errors: 0

NDSRepair process completed.

9 – Inhibit_move obituaries and how to get them progressed

First the explanation:

When any object is moved from one location to another in the database (for example from ou=accounting.o=novell to ou=users.novell) the “old” location of the object will get a MOVED obituary and the “new” location will receive a INHIBIT_MOVE obituary.

The obituary process will take place just as it would with deleting an object, however if the new location is in a different partition it may need to contact another server to negociate the process.

(the server holding the master replica for a partition needs to do this and if you have 2 partitions involved there may be 2 different servers needed to progress the obits.)

In this process we sometimes see that the MOVED obituary is processed just fine along with it’s backlink obituaries but the INHIBIT_MOVE obituary is not progressed and remains at flags = 0000

We call this an “orphaned INHIBIT_MOVE” obituary

Before we think about any fix we need to verify if this is truely the case and need to check all servers that hold master replica’s for any MOVED obituary to make sure we are not breaking our system when we try and fix this.

Once we have verified and are satisfied that no MOVED obituary exists anywhere for the object that has the INHIBIT_MOVE obituary we can proceed with the fix.

The fix is: TID 3908200

p.s. If the object not only holds a INHIBIT_MOVE but also a DEAD obituary you will need to contact Novell Technical Support


10 – master server is clean but obituaries are still seen on servers that have a read/write

If this document is followed and no more obituaries are seen when checking the server holding the master replica it may still be possible that one or more of the servers holding a read/write replica still show obituaries for the partition that is worked on.

To get these progressed you will need to timestamp these obituaries in order to get them sent to the master for the partition for progressing.

Preferred would be you do this on the server that holds the read/write that shows the most obituaries.

You can do this by running a -OT repair:

Netware: Load dsrepair -OT ->Advanced options menu ->Repair Local DS database -> F10 to start the repair

Linux: as root type: ndsrepair -R -Ad -OT

Windows: From ndscons load the dsrepair.dlm with “-OT” in the startup parameter line -> Repair -> Local Database Repair… click repair.

follow step 10 until all replicas are clean and do not show the obituaries.

Related:

OneFS: How to recover individual file(s) from Snapshots using the OneFS’ Sync IQ

Article Number: 514023 Article Version: 3 Article Type: How To



Isilon OneFS,Isilon SyncIQ

This KB article explains how to restore/recover data from Snapshots via Sync IQ

A Snapshot is a copy of the files/folders within the location selected. The contents of each snapshot reflect the state of the file system at the time the snapshot was created. It is easy to navigate through each snapshot as if it were still active. Your directories/folders and files will appear as they were at the time that the snapshot was created. You can easily recover your own files, before snapshot expiration, simply by copying an earlier version from the snapshot to the original directory or to an alternate location.

Note: It is good practice to copy files to a temporary directory rather than overwriting current files/folders. This gives users the option to keep either the Snapshot copy, current copy, or both

Example:

Assume we have folder /ifs/original_folder andthis folder contained subfolders:

folder1 folder2 folder3 folder4

subfolders “folder1 , folder2 and folder3” got deleted from HEAD but they still exist in snapshot with ID 59 as below:

# ls /ifs/original_folder

folder4

# isi snapshot snapshots view snapshot

ID: 59

Name: snapshot

Path: /ifs/original_folder

Has Locks: No

Schedule: –

Alias Target ID: –

Alias Target Name: –

Created: 2017-11-15T11:04:58

Expires: –

Size: 10.0k

Shadow Bytes: 0

% Reserve: 0.00%

% Filesystem: 0.00%

State: active

Riptide-1# ls /ifs/original_folder/.snapshot/snapshot

folder1 folder2 folder3 folder4

We will recover subfolder “folder 2″ only from snapshot 59 to the path /ifs/recoverd_folder as below:

1- Create a policy with source /ifs/original_folder and to include subfolder “folder2” only

# isi sync policies create –name=recover –source-root-path=/ifs/original_folder –source-include-directories=/ifs/original_folder/folder2 —target-host=localhost –target-path=/ifs/recoverd_folder –action=sync

2- Start a sync job for policy recover but from snapshot id 59

In OneFS 7.x

# isi_classic sync pol start recover–use_snap 59

In OneFS 8.x

# isi sync jobs start –policy-name=recover –source-snapshot=59

3- Confirm that the sync job finished

# isi sync reports list

Policy Name Job ID Start Time End Time Action State

—————————————————————————–

recover 1 2017-11-15T11:38:02 2017-11-15T11:38:07 run finished

—————————————————————————–

Total: 1

4- Check recovered folder , it will contain subfolder “folder2” only.

# ls /ifs/recoverd_folder

folder2

Related:

Citrix ADC High Availability: How do I?

NS12.0 add cluster node, node sync stuck in “IN PROGRESS” state

Synchronization stuck due to new added node cannot establish TCP Connection to CCO.

in ns.log, you may see the following logs for this problem.

Enable debug log on Netscaler through System>Auditing>Syslog settings, You can get more detailed logs about the synchronization process.

Refer log pattern:

cat /var/log/ns.log |grep cluster

Jul 11 15:13:00 <local0.info> ns [1510]: cld_fullsync_req_handler(): Received config sync START message from Clusterd (sync_id=2)

Jul 11 15:13:00 <local0.info> ns [22327]: cfsync_pullcfg_n_apply(): Connection failed [Could not connect to any of master nodes provided in sync start message from clusterd]

Jul 11 15:13:00 <local0.info> ns [22327]: cld_fullsync_req_handler(): Sync FAILURE sent to clusterd for sync_id=2

Jul 11 15:13:00 <local0.info> 172.16.1.73 07/11/2018:07:13:00 GMT 2-PPE-2 : default CLUSTERD Message 1394 1531293180485406 : “REC: status FAILURE from client CFSYNCD for ID 2 “

cfsync_pullcfg_n_apply(): Connection failed [Could not connect to any of master nodes provided in sync start message from clusterd]

After check, customer found it was caused by backplane switch jumbo frame setting.

Related:

Re: NDMP backup of Unity file system is timed out after 24 hours of run

Sorry for delay. Karl, your tip was helpful and I was able to avoid 24-hrs timeout.

Now facing another issue when after approximately 5 days, backup session is still active but nothing is actually being copied to DD. So, trying to understand at what end it brakes: NDMP accelerator, DD, Unity.

Logs on NDMP accelerator:

2018-08-08 07:01:17 avndmp Error <11894>: [snapup-/Jira-Filessystem] Send message failed call for 400.

2018-08-08 07:01:17 avndmp Error <11893>: [snapup-/Jira-Filessystem] ndmp socket write, trying to write 4000 bytes, send returned error 32: code 32: Broken pipe

Session log on Avamar side:

2018-08-08 06:28:25 avtar Warning <6568>:Internal Warning: Very large number of entries on TODO queue (139508)

2018-08-08 06:43:14 avtar Info <41437>: Executing a flush (fsync) to keep write stream alive for file “avamar-1521563896/container.1.cdsf” with file handle 1812451331

2018-08-08 06:43:25 avtar Info <8688>: Status 2018-08-08 06:43:25, 11,456,912 files, 1,829,445 directories, 50,861 GB (11,456,912 files, 1.441 GB, 49.00% new) 3227MB 0% CPU (1 open files)

Related:

7023297: Low Disk Performance with high IO stalls system

What happens is that the filesystem with barriers enabled issues flush requests to all intermediate layers only to get discarded by the SCSI Disks and this slows down the system leading to the observed performance issue.

To alleviate this issue it is recommended to mount the relevant filesystems with

nobarrier

as mount option.

Extreme care should be taken to ensure that the device connected to this Filesystem really has a volatile cache or not.

If it does then setting

nobarrier

can result in data loss. Please never set nobarrier on a Filesystem on a device with cache enabled!

To identify whether the device has a cache or not, one can check

dmesg

and check for “cache” like

dmesg | grep cache

and the result might look like

[ 3.685928] sd 0:2:0:0: [sda] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA

[ 5.140281] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn’t support DPO or FUA

as can be seen the device identified as

sda

reports the Write cache as disabled, so a Filesystem associated with this device can use

nobarrier

the opposite in this example is the device identified as

sdb

this device reports the Write cache enabled, so no Filesystem associated with this device should

have the barriers removed to prevent data loss.

This can also be checked in the running system with the tool

sdparm

On the same system as in the example above the output of sdparm reads

belphegore:~ # sdparm –get=WCE=1 /dev/sda

/dev/sda: DELL PERC H730 Mini 4.27

WCE 0

which means Write Cache disabled for sda, nobarrier possible

belphegore:~ # sdparm –get=WCE=1 /dev/sdb

/dev/sdb: IFT DS 1000 Series 555Q

WCE 1

which means Write Cache enabled for sdb, barrier necessary

Related:

Re: unable to promote secondary mirror

I have a mirrored setup. The primary system lost 3 of its drives and failed, So I turned it off so I could promote the secondary mirror.

When I look in Navisphere at the secondary mirror it says SYNC active and Synchronizing 0%

When I try to promote the secondary mirror – it gives the error “The Promote function is not allowed when the mirror’s state is not synchronized or not consistent”

Well no kidding! The primary system failed so how can it be synchronized! I’ve had the actual disaster the system is designed to allow you to fail over in, but it won’t let me promote the secondary image.

I have tried disabling write cache, rebooting the storage processors. read about “Force Promoting” but I can’t get to the dialog that offers a force promotion or a local promotion.

Help!

Related:

Endurant Cache

Received a couple of recent questions from the field around Isilon’s endurant cache, and figured it would make an interesting topic for a broader audience.



The endurant Cache, or EC, is OneFS’ caching mechanism for synchronous writes – or writes that require a stable write acknowledgement to be returned to an NFS client.



The EC operates in conjunction with the OneFS write cache (coalescer) to ingest, protect and aggregate small, synchronous NFS writes. The incoming write blocks are staged to NVRAM, ensuring the integrity of the write, even during the unlikely event of a node’s power loss. Furthermore, EC also creates multiple mirrored copies of the data, further guaranteeing protection from single node and, if desired, multiple node failures.



EC improves the latency associated with synchronous writes by reducing the time to acknowledgement back to the client. This process removes the Read-Modify-Write (R-M-W) operations from the acknowledgement latency path, while also leveraging the coalescer to optimize writes to disk. EC is also tightly coupled with OneFS’ multi-threaded I/O (Multi-writer) process, to support concurrent writes from multiple client writer threads to the same file. And the design of EC ensures that the cached writes do not impact snapshot performance.



The endurant cache uses write logging to combine and protect small writes at random offsets into 8K linear writes. To achieve this, the writes go to a mirrored files, or Logstores. The response to a stable write request can be sent once the data is committed to the Logstore. Logstores can be written to by several threads from the same node, and are highly optimized to enable low-latency concurrent writes.

Note that if a write uses the EC, the coalescer must also be used. If the coalescer is disabled on a file, but EC is enabled, the coalescer will still be active with all data backed by the EC.

So what exactly does an endurant cache write sequence look like?



Say an NFS client wishes to write a file to an Isilon cluster over NFS with the O_SYNC flag set, requiring a confirmed or synchronous write acknowledgement. Here is the sequence of events that occur to facilitate a stable write.

1) A client, connected to node 3, begins the write process sending protocol level blocks. 4K is the optimal block size for the endurant cache.

ec_1.png

2) The NFS client’s writes are temporarily stored in the write coalescer portion of node 3’s RAM. The Write Coalescer aggregates uncommitted blocks so that the OneFS can, ideally, write out full protection groups where possible, reducing latency over protocols that allow “unstable” writes. Writing to RAM has far less latency that writing directly to disk.

3) Once in the write coalescer, the endurant cache log-writer process writes mirrored copies of the data blocks in parallel to the EC Log Files.



The protection level of the mirrored EC log files is the same as that of the data being written by the NFS client.



ec_2.png

4) When the data copies are received into the EC Log Files, a stable write exists and a write acknowledgement (ACK) is returned to the NFS client confirming the stable write has occurred. The client assumes the write is completed and can close the write session.

ec_3.png

5) The write coalescer then processes the file just like a non-EC write at this point. The write coalescer fills and is routinely flushed as required as an asynchronous write via to the block allocation manager (BAM) and the BAM safe write (BSW) path processes.

6) The file is split into 128K data stripe units (DSUs), parity protection (FEC) is calculated and FEC stripe units (FSUs) are created.

ec_4.png

7) The layout and write plan is then determined, and the stripe units are written to their corresponding nodes’ L2 Cache and NVRAM. The EC logfiles are cleared from NVRAM at this point. OneFS uses a Fast Invalid Path process to de-allocate the EC Log Files from NVRAM.

ec_5.png



8) Stripe Units are then flushed to physical disk.

9) Once written to physical disk, the data stripe Unit (DSU) and FEC stripe unit (FSU) copies created during the write are cleared from NVRAM but remain in L2 cache until flushed to make room for more recently accessed data.

ec_6.png

As far as protection goes, the number of logfile mirrors created by EC is always one more than the on-disk protection level of the file. For example:

ec_7.png



The EC mirrors are only used if the initiator node is lost. In the unlikely event that this occurs, the participant nodes replay their EC journals and complete the writes.

If the write is an EC candidate, the data remains in the coalescer, an EC write is constructed, and the appropriate coalescer region is marked as EC. The EC write is a write into a logstore (hidden mirrored file) and the data is placed into the journal.

Assuming the journal is sufficiently empty, the write is held there (cached) and only flushed to disk when the journal is full, thereby saving additional disk activity.

An optimal workload for EC involves small-block synchronous, sequential writes – something like an audit or redo log, for example. In that case, the coalescer will accumulate a full protection group’s worth of data and be able to perform an efficient FEC write.

The happy medium is a small-block sync (vmdk) type load where the I/O rate is low, and the client is latency-sensitive. In this case, the latency will be reduced and, if the I/O rate is low enough, it won’t create serious pressure.

The undesirable scenario is when the cluster is already spindle-bound and the workload is such that it generates a lot of journal pressure. In this case, EC is just going to aggravate things.

So how exactly do you configure the enduracnt cache?



Although on by default, the setting the efs.bam.ec.mode sysctl to value ‘1’ will enable the Endurant Cache:

# isi_for_array –s isi_sysctl_cluster efs.bam.ec.mode=1

EC can also be enabled & disabled per directory:

# isi set -c [on|off|endurant_all|coal_only] <directory_name>

To enable the coalescer but switch of EC, run:

# isi set -c coal_only

And to disable the endurant cache completely:

# isi_for_array –s isi_sysctl_cluster efs.bam.ec.mode=0

A return value of zero on each node from the following command will verify that EC is disabled across the cluster:

# isi_for_array –s sysctl efs.bam.ec.stats.write_blocks efs.bam.ec.stats.write_blocks: 0

If the output to this command is incrementing, EC is delivering stable writes.

As mentioned previously, EC applies to stable writes. Namely:



  • Writes with O_SYNC and/or O_DIRECT flags set
  • Files on synchronous NFS mounts



When it comes to analyzing any performance issues involving EC workloads, consider the following:



  • What changed with the workload?
  • If upgrading OneFS, did the prior version also have EC enable?
  • If the workload has moved to new cluster hardware:
  • Does the performance issue occur during periods of high CPU utilization?
  • Which part of the workload is creating a deluge of stable writes?
  • Was there a large change in spindle or node count?
  • Has the OneFS protection level changed?
  • Is the SSD strategy the same?



Disabling EC is typically done cluster-wide and this can adversely impact certain workflow elements. If the EC load is localized to a subset of the files being written, an alternative way to reduce the EC heat might be to disable the coalescer buffers for some particular target directories, which would be a more targeted adjustment. This can be configured via the isi set –c off command.



One of the more likely causes of performance degradation is from applications aggressively flushing over-writes and, as a result, generating a flurry of ‘commit’ operations. This can generate heavy read/modify/write (r-m-w) cycles, inflating the average disk queue depth, and resulting in significantly slower random reads. The isi statistics protocol CLI command output will indicate whether the ‘commit’ rate is high.



It’s worth noting that synchronous writes do not require using the NFS ‘sync’ mount option. Any programmer who is concerned with write persistence can simply specify an O_FSYNC or O_DIRECT flag on the open() operation to force synchronous write semantics for that fie handle. With Linux, writes using O_DIRECT will be separately accounted-for in the Linux ‘mountstats’ output. Although it’s almost exclusively associated with NFS, the EC code is actually protocol-agnostic. If writes are synchronous (write-through) and are either misaligned or smaller than 8k, they have the potential to trigger EC, regardless of the protocol.

The endurant cache can provide a significant latency benefit for small (eg. 4K), random synchronous writes – albeit at a cost of some additional work for the system.



However, it’s worth bearing the following caveats in mind:



  • EC is not intended for more general purpose I/O.
  • There is a finite amount of EC available. As load increases, EC can potentially ‘fall behind’ and end up being a bottleneck.
  • Endurant Cache does not improve read performance, since it’s strictly part of the write process.
  • EC will not increase performance of asynchronous writes – only synchronous writes.

Related: