Path-based File Pool Policies

Got the following question from the field recently:



I have a cluster with an primary X410 pool and archive NL410 pool. There is a nightly job that moves inactive files from primary to archive. However, can I set it up so that when I copy files to a folder they go directly to the NL pool without waiting for the nightly job to run?



The answer to the above is yes, with a couple of caveats.



Since the filepool policy applies to the directory, any new files written to it will automatically inherit the settings from the parent directory. Typically, there is not much variance between the directory and the new file. So, assuming the settings are correct, the file is writtenstraight to the desired pool or tier, with the appropriate protection, etc. This applies to access protocols like NFS and SMB, as well as copy commands like ‘cp’ issued directly from the OneFS command line interface (CLI). However, if the file settings differ from the parent directory, the SmartPools job will correct them and restripe the file. This will happen when the job next runs, rather than at the time of file creation.



However, simply moving a file into the directory (via the UNIX CLI commands such as cp, mv, etc) will not occur until a SmartPools, SetProtectPlus, Multiscan, or Autobalance job runs to completion. Since these jobs can each perform a re-layout of data, this is when the files will be re-assigned to the desired NL pool. The file movement can be verified by running the following command from the OneFS CLI:



# isi get -dD <dir>



So the key is whether you’re doing a copy (that is, a new write) or not. As long as you’re doing writes and the parent directory of the destination has the appropriate file pool policy applied, you should get the behavior you want.



One thing to note: If the actual operation that is desired is really a move rather than a copy, it may be faster to change the file pool policy and then do a recursive “isi filepool apply –recurse” on the affected files.



There’s negligible difference between using an NFS or SMB client versus performing the copy on-cluster via the OneFS CLI. As mentioned above, using isi filepool apply will be slightly quicker than a straight copy and delete, since the copy is parallelized above the filesystem layer.



Let’s take a quick file pools refresher…



File pools is the SmartPools logic layer, where user configurable policies govern where data is placed, protected, accessed, and how it moves among the Node Pools and Tiers. This is conceptually similar to storage ILM (information lifecycle management), but does not involve file stubbing or other file system modifications. File Pools allow data to be automatically moved from one type of storage to another within a single cluster to meet performance, space, cost or other requirements, while retaining its data protection settings.



For the scenario above, a File Pool policy may be crafted which dictates that anything written to path /ifs/path1is automatically moved directly to the Archive tier. For example:

path-based_placement_1.png

To simplify management, there are defaults in place for Node Pool and File Pool settings which handle basic data placement, movement, protection and performance. All of these can also be configured via the simple and intuitive UI, delivering deep granularity of control. Also provided are customizable template policies which are optimized for archiving, extra protection, performance and VMware files.



When a SmartPools job runs, the data may be moved, undergo a protection or layout change, etc. There are no stubs. The file system itself is doing the work so no transparency or data access risks apply.



Data movement is parallelized with the resources of multiple nodes being leveraged for speedy job completion. While a job is in progress all data is completely available to users and applications.



The performance of different nodes can also be augmented with the addition of system cache or Solid State Drives (SSDs). Within a File Pool, SSD ‘Strategies’ can be configured to place a copy of that pool’s metadata, or even some of its data, on SSDs in that pool.



Overall system performance impact can be configured to suit the peaks and lulls of an environment’s workload. Change the time or frequency of any SmartPools job and the amount of resources allocated to SmartPools. For extremely high-utilization environments, a sample File Pool policy can be used to match SmartPools run times to non-peak computing hours. While resources required to execute SmartPools jobs are low and the defaults work for the vast majority of environments, that extra control can be beneficial when system resources are heavily utilized.

SmartPools file pool policies can be used to broadly control the three principal attributes of a file:

1. Where a file resides.

    • Tier
    • Node Pool

2. The file performance profile (I/O optimization setting).

    • Sequential
    • Concurrent
    • Random
    • SmartCache write caching

3. The protection level of a file.

    • Parity protected (+1n to +4n, +2d:1n, etc)
    • Mirrored (2x – 8x)



path-based_placement_2.png



A file pool policy is built on a file attribute the policy can match on. The attributes a file Pool policy can use are any of: File Name, Path, File Type, File Size, Modified Time, Create Time, Metadata Change Time, Access Time or User Attributes.

Once the file attribute is set to select the appropriate files, the action to be taken on those files can be added – for example: if the attribute is File Size, additional settings are available to dictate thresholds (all files bigger than… smaller than…). Next, actions are applied: move to Node Pool x, set to y protection level and lay out for z access setting.

File Attribute

Description

File Name

Specifies file criteria based on the file name

Path

Specifies file criteria based on where the file is stored

File Type

Specifies file criteria based on the file-system object type

File Size

Specifies file criteria based on the file size

Modified Time

Specifies file criteria based on when the file was last modified

Create Time

Specifies file criteria based on when the file was created

Metadata Change Time

Specifies file criteria based on when the file metadata was last modified

Access Time

Specifies file criteria based on when the file was last accessed

User Attributes

Specifies file criteria based on custom attributes – see below

‘And’ and ‘Or’ operators allow for the combination of criteria within a single policy for flexible, granular data manipulation.



As we saw earlier, for file Pool Policies that dictate placement of data based on its path, data typically lands on the correct node pool or tier without a SmartPools job running. File Pool Policies that dictate placement of data on other attributes besides path name get written to Disk Pool with the highest available capacity and then moved, if necessary to match a File Pool policy, when the next SmartPools job runs. This ensures that write performance is not sacrificed for initial data placement.



Any data not covered by a File Pool policy is moved to a tier that can be selected as a default for exactly this purpose. If no Disk Pool has been selected for this purpose, SmartPools will default to the Node Pool with the most available capacity.

Related:

How do you tell if an old SPSS file is corrupted?

I have this old SPSS file from 1989 that won’t open. At first, I opened it in UltraEditor to double check that is was in fact an SPSS file which the file read as “PCSPSS SYSTEM FILE. IBM PC DOS, SPSS/PC+”. I then tried to open it in the PSPP application, but nothing happens and the program sends me to the syntax page. From there, I tried to convert it using a .por converter since I thought the file was originally saved as a portable SPSS file, but that did not work either. don’t think the file is coded incorrectly seeing that I was able to open a file very similiar to this one in terms of content, date, and size and convert it, so I am not sure what exactly the issue is. Even when I try to attach a .txt version of the file, I get an error stating that the file type is invalid I am thinking that it may be a corrupted file, and I was wondering if there was any way to tell and possibly any way to repair it? Or, if you have any other suggestions in opening this file, I would glad to try other options

Thanks,

Brian

Related:

How to tell if an SPSS file is corrupted?

I have this old SPSS file from 1989 that won’t open. At first, I opened it in UltraEditor to double check that is was in fact an SPSS file which the file read as “PCSPSS SYSTEM FILE. IBM PC DOS, SPSS/PC+”. I then tried to open it in the PSPP application, but nothing happens and the program sends me to the syntax page. From there, I tried to convert it using a .por converter since I thought the file was originally saved as a portable SPSS file, but that did not work either. I don’t think it has anything to do with the coding of the file since I was able to open a separate file roughly the same size from around the same time and convert it, so I am not sure exactly what the issue is regarding this particular file. When I try to attached a .txt version of the file, I get an error that: “This attachment is not permitted because the file type is invalid.” I am thinking that it may be a corrupted file, and I was wondering if there was any way to tell and possibly any way to repair it? Or, if you have any other suggestions in opening this file, I would glad to try other options.

Thanks,

Brian

Related:

Uploading huge amount of data using IIB 10

The scenario for which solution needs to be built in **IIB 10** deals with a file with huge amount of data (expected file size would be 2-3 GB, file format csv) needs to be read and the data needs to be uploaded onto DB. There are a few specification that needs to be considered are 1) The file pick up is database event driven . 2)The file needs to be validated before the data from the file is loaded. 3) No partial file load is allowed i.e. if a certain record fails to load in a certain file then all other records should be rolled back. 3) The files are expected to have 100-120 fields in it.

I )There is also an ask to poll the data from Application table only for a certain time slot in the day (8 hr – 18 hr). As per configuration scheduling is not readily available for DatabaseInput node and data is polled through out the day. Is there a way to poll data only for a certain part of the day?

II) What would be the best way to split the file ESQL/Java compute node?

III) Incase the file is split how would the partial data load be avoided ?

IV) How the above data load can be achieved keeping the performance in mind.

Related:

Uploading large data from file using IIB 10

The scenario for which solution needs to be built in **IIB 10** deals with a file with huge amount of data (expected file size would be 2-3 GB, file format csv) needs to be read and the data needs to be uploaded onto DB. There are a few specification that needs to be considered are 1) The file pick up is database event driven . 2)The file needs to be validated before the data from the file is loaded. 3) No partial file load is allowed i.e. if a certain record fails to load in a certain file then all other records should be rolled back. 3) The files are expected to have 100-120 fields in it.

I )There is also an ask to poll the data from Application table only for a certain time slot in the day (8 hr – 18 hr). As per configuration scheduling is not readily available for DatabaseInput node and data is polled through out the day. Is there a way to poll data only for a certain part of the day?

II) What would be the best way to split the file ESQL/Java compute node?

III) Incase the file is split how would the partial data load be avoided ?

IV) How the above data load can be achieved keeping the performance in mind.

Related: