7023376: Performance issues and odd lock errors when virtual OES server is under load.

Temporary change to I/O scheduler

For temporary testing, it is possible to change the IO scheduler for one or more individual disks without rebooting the server. To do so, execute the following as root:

echo SCHEDNAME > /sys/block/DEV/queue/scheduler

replacing:

SCHEDNAME with the scheduler name (noop or deadline) and

DEV with the device the scheduler should be used on (i.e. sdc).

If you want temporarily change the scheduler for multiple disks, you will need to run the “echo” command (above) against each device; replacing the SCHEDNAME and DEV; for each device.

Note: when the server is restarted, the default scheduler will return for each of these devices.

More permanent change across all devices

To implement a non-cfq scheduler across all disks, that remains after the host is restarted, you will need to:

– Go into YaST

– Select “System” -> “Boot Loader” and press <enter>

– Go to “Kernel Parameters” tab and go into the “Optional Kernel Command Line Parameter” section.

– Add “elevator=SCHEDNAME” (where SCHEDNAME is the scheduler desired).

– Save/OK and exit out of YaST.

– Restart the server to activate the new scheduler.

This will turn off the SLES’s disk IO optimization making the virtual machine’s IO a FIFO (first in, first out) to the underlying storage system. All IO optimizations, with this setting, will be accomplished on the storage system.

Related:

VNX: Unable to unmount/delete filesystem or checkpoint errors: Device or resource busy/path is unavailable(frozen) or invalid[1]

Article Number: 480032 Article Version: 3 Article Type: Break Fix



VNX1 Series,VNX2 Series,Unisphere for VNX,VNX Operating Environment

Trying to unmount/delete file systems fscommon_03 and fscommon_05 and getting the error;

Delete file system fscommon_03. vdm01 : Device or resource busy.

A race condition caused by range locks held on a CIFS file even though the file was closed. This caused a variety of symptoms including: Failure to access the file from a CIFS client; An unmount of the File System hung; A freeze of the File System hung.

Review of the /nas/log/cmd_log.err file and server_log shows the following symptoms;

[nasadmin@CS0 ~]$ tail /nas/log/cmd_log.err

2016-03-23 15:20:40.228 vdm01:517:7580:E: server_umount vdm01 -perm fscommon_03: Device or resource busy

2016-03-23 15:21:09.155 vdm01:517:9117:E: server_umount vdm01 -perm fscommon_05: Device or resource busy

2016-03-23 15:47:32.895 vdm01:517:29927:E: server_umount vdm01 -perm fscommon_03: Device or resource busy

2016-03-23 15:48:05.181 vdm01:517:30473:E: server_umount vdm01 -perm fscommon_05: Device or resource busy

2016-03-24 10:16:10.064 vdm01:517:12748:E: server_umount vdm01 -perm fscommon_05: Device or resource busy

2016-03-24 10:17:20.837 vdm01:517:15520:E: server_umount vdm01 -perm fscommon_03: Device or resource busy

2016-03-24 10:25:41.265 vdm01:517:29433:E: server_umount vdm01 -perm fscommon_03: Device or resource busy

[nasadmin@CS0 ~]$ server_log server_2 |grep -i frozen |tail -10

2016-03-24 12:07:59: CFS: 6: /root_vdm_1/fscommon_03: path is unavailable(frozen) or invalid.

2016-03-24 12:07:59: CFS: 6: /root_vdm_1/fscommon_05: path is unavailable(frozen) or invalid.

2016-03-24 12:09:07: CFS: 6: /root_vdm_1/fscommon_03: path is unavailable(frozen) or invalid.

2016-03-24 12:09:07: CFS: 6: /root_vdm_1/fscommon_05: path is unavailable(frozen) or invalid.

2016-03-24 12:09:07: CFS: 6: /root_vdm_1/fscommon_03: path is unavailable(frozen) or invalid.

2016-03-24 12:09:07: CFS: 6: /root_vdm_1/fscommon_05: path is unavailable(frozen) or invalid.

2016-03-24 12:09:08: CFS: 6: /root_vdm_1/fscommon_03: path is unavailable(frozen) or invalid.

2016-03-24 12:09:08: CFS: 6: /root_vdm_1/fscommon_05: path is unavailable(frozen) or invalid.

2016-03-24 12:09:08: CFS: 6: /root_vdm_1/fscommon_03: path is unavailable(frozen) or invalid.

2016-03-24 12:09:08: CFS: 6: /root_vdm_1/fscommon_05: path is unavailable(frozen) or invalid.

Attempting to delete or unmount a file system or checkpoint

Workaround:

Failover or reboot the affected data mover where the file system or ckeckpoint is currently mounted. Once completed, the unmount/deletion of the file system or checkpoint can be completed without further issues.

Perm Fix:

VNX1 Series: 7.1.74.505, 7.1.76.415 and higher code levels

VNX2 Series: 8.1.3.72 and higher

Related:

Policy Run Error

I need a solution

Hi All,

I am facing  “Job stopped,stuck in running state for more than 20hrs ” error on running policy scans on Unix servers installed with Symantec CCS agent .Can you please provide any solution to this?

I have already refered the article – https://support.symantec.com/en_US/article.TECH111738.html and made relevent changes but still facing the same error.

Would really appreciate some insight on this issue.

Thank you

0

Related:

Completing Setup after “The Computer Restarted unexpectedly or encountered an unexpected error”

When Windows Setup hits an error and restarts (like if it panics or if you reset it because it appears to have hung), you just get the dreaded “The Computer Restarted unexpectedly or encountered an unexpected error” dialog box which you cannot recover from. Until now:

http://answers.microsoft.com/en-us/windows/forum/windows_7-system/error-message-the-computer-restarted-unexpectedly/b770f14d-e345-e011-90b6-1cc1de79d2e2

Run RegEdit in the machine (after Shift-F10 to get the CMD window).

HKLM/SYSTEM/SETUP/STATUS/ChildCompletion

Check for setup.exe on the right, and if the value is 1 change it to 3. Then close RegEdit and click the button to reboot again. This may allow Setup tom complete and get you a more functional Windows than you can get from Shift-F10.

Related:

Datacap 9.1.1 Rulerunner intermittent issue,automatically sudden stop.

Hi,We are having Datacap 9.1.1 Rulerunner intermittent issue starting from couple of weeks. This was a stable system and no changes has been done.

Below are the symptoms of faced issue:-

1. Rulerunner service stops at its own in few hours or sometimes after its regular restart interval.
Note – “Stop on termination” is disabled.
2. Manually attempting to start this RR service results in “stoppending ” status and after frequently tries or reboot it starts fine but again follow step 1. behavior.

3.Its not creating new batches nor picking pending batches.

4.In Event Viewer,DCOProcessor.exe error,unhandled exception is shown.

5.In **RR log** :- ExecuteCode: Internal local exception: Command to thread #[3]: [Received a Stop command] [in CTMMThread::take_a_nap].

**NOTE:** We have also opened PMR ID:46463,999,744 but no progress so far.

Related:

Reboot to Production failing

I need a solution

We have a remote vendor connected through VPN that does all of our new PC imaging, for awhile now they have been having an issue that I can’t seem to figure out what is going on.

The device boots to PXE and images without issue along with all drivers loading fine, at this point it should go to reboot to production and then keep going through the list of tasks like adding itself to filters, updating agents, and a rename the pc task. Yet it seems to be the device is rebooting but altiris does not recognize the task has ran so it just sits there saying it’s waiting for agent, after 60 mins the job fails and the entire image job fails. Meaning I have to remote in and manuly run all of the tasks.

I have pulled every log off one of these devices that failed and cannot find anything that points to why its not seeing the reboot to production job, I even changed that out and just created a script doing wpeutil reboot
exit 0
I tried this since I found an old post about some having this same issue with 7.5 H3 but the thread was locked with no resolution ever given. We are currently on 8.1 H1, no other internal sites see this issue that I know of but most of them do one or two devices at a time where the vendor may be doing 5 to 10 at a time, sometimes all of devices will suffer this issue other times it may only be 3 of them that fail.

Anyone have ideas of where else I should be looking for clues? Well typing this I did think maybe a work around might be to set that reboot task to a 15 min time limit and even if it fails to keep running the image tasks but still like nail down the reason this is not working.

0

Related:

Calendaring agent failed to open registry with error code thread id.

Details
Product: Exchange
Event ID: 8210
Source: EXCDO
Version: 6.5.0000.0
Message: Calendaring agent failed to open registry with error code thread id.
   
Explanation
Microsoft Exchange System Attendant service issued a stop request for a virtual server. The free/busy module tried to stop its threads, but the polling thread did not stop in two minutes (current timeout value). This is only a warning and should not affect free/busy processing.
   
User Action
Check if the system is busy, and verify that it is not low on resources or memory.

Related:

Hanging application %1, version %2, hang module %3, version %4, hang address 0x%5.

Details
Product: Windows Operating System
Event ID: 1002
Source: Application Hang
Version: 5.2
Symbolic Name: ER_HANG_LOG
Message: Hanging application %1, version %2, hang module %3, version %4, hang address 0x%5.
   
Explanation

The indicated program stopped responding. The message contains details on which program and module stopped responding. A matching event with EventID 1001 might also appear in the event log. This matching event displays information about the specific error that occurred.

   
User Action

No user action is required.

Related: