Avamar Client for Windows: Avamar backup fails with “avtar Error : Out of memory for cache file” on Windows clients

Article Number: 524280 Article Version: 3 Article Type: Break Fix



Avamar Plug-in for Oracle,Avamar Client for Windows,Avamar Client for Windows 7.2.101-31



In this scenario we have the same issue presented in the KB 495969 however the solution does not apply due to an environment issue on a Windows client.

  • KB 495969 – Avamar backup fails with “Not Enough Space” and “Out of Memory for cache file”

The issue could affect any plugin like in this case with the error presented in the following manner:

  • For FS backups:
avtar Info <8650>: Opening hash cache file 'C:Program Filesavsvarp_cache.dat'avtar Error <18866>: Out of memory for cache file 'C:Program Filesavsvarp_cache.dat' size 805306912avtar FATAL <5351>: MAIN: Unhandled internal exception Unix exception Not enough space
  • For VSS backups:
avtar Info <8650>: Opening hash cache file 'C:Program Filesavsvarp_cache.dat'avtar Error <18866>: Out of memory for cache file 'C:Program Filesavsvarp_cache.dat' size 1610613280avtar FATAL <5351>: MAIN: Unhandled internal exception Unix exception Not enough space
  • For Oracle backup:
avtar Info <8650>: Opening hash cache file 'C:Program Filesavsvarclientlogsoracle-prefix-1_cache.dat'avtar Error <18866>: Out of memory for cache file 'C:Program Filesavsvarclientlogsoracle-prefix-1_cache.dat' size 100663840avtar FATAL <5351>: MAIN: Unhandled internal exception Unix exception Not enough spaceor this variant:avtar Info <8650>: Opening hash cache file 'C:Program Filesavsvarclientlogsoracle-prefix-1_cache.dat'avtar Error <18864>: Out of restricted memory for cache file 'C:Program Filesavsvarclientlogsoracle-prefix-1_cache.dat' size 100663840avtar FATAL <5351>: MAIN: Unhandled internal exception Unix exception Not enough space avoracle Error <7934>: Snapup of <oracle-db> aborted due to rman terminated abnormally - check the logs
  • With the RMAN log reporting this:
RMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03002: failure of backup plus archivelog command at 06/14/2018 22:17:40RMAN-03009: failure of backup command on c0 channel at 06/14/2018 22:17:15ORA-04030: out of process memory when trying to allocate 1049112 bytes (KSFQ heap,KSFQ Buffers)Recovery Manager complete. 

Initially it was though the cache file could not grow in size due to incorrect “hashcachemax” value.

The client had plenty of free RAM (48GB total RAM) so we increase the flag’s value from -16 (3GB file size max) to -8 (6GB file size max)

But the issue persisted and the disk space was also not an issue, there was plenty of GBs of free space

Further investigations with a test binary from the engineering team lead to the fact that the MS OS was not releasing enough unused and contiguous memory required to allocate/load into the memory the entire hash cache file for the backup operation.

It was tried a test binary that would allocate the memory in smaller pieces to see if we could reach the point where the OS would allow the full file p_cache.dat to be loaded into memory but that also did not help, the Operative system was still not allowing to load the file into memory for some reason.

The root cause is hided somewhere in the OS however in this case we did not engage the MS team for further investigations on their side.

Instead we found a way to work around the issue setting the cache file to be smaller, see details in the resolution section below.

In order to work around this issue we set the hash cache file to be of a smaller size so that the OS would not have issues in allocating it into memory.

In this case it was noticed that the OS was also having problems in allocating smaller sizes like 200+ MB so we decided to re-size the p_cache.dat to be just 100MB with the use of the following flag:

–hashcachemax=100

This way the hash cache file would never grow beyond 100MB and would overwrite the old entries.

After adding that flag it is requited to recycle the cache file by renaming or deleting the p_cache.dat (renaming is the preferred option)

After the first backup which would take longer than usual as expected (to rebuild the cache file) the issue should be resolved.

  • The Demand-paging cache is not recommended in this scenario since the backup are directed to GSAN storage so the Monolithic paging cache was used.
  • Demand-paging was designed to gain benefit for backup being sent to DataDomain storage.

Related:

Leave a Reply