In the latest of this series of job engine articles, we’ll take a closer look at OneFS exclusion sets. These job execution classes define which jobs can run simultaneously within OneFS. A job is not required to be part of any exclusion set, and jobs may also belong to multiple exclusion sets.
The Job Engine only allows up to three jobs to be run simultaneously. If a fourth job with a higher priority is started, the lowest of the currently executing jobs will be paused. For example:
# isi job start fsanalyze
Started job [583]
# isi job status
The job engine is running.
Running and queued jobs:
ID Type State Impact Pri Phase Running Time
———————————————————
578 SmartPools Running Low 6 1/2 11s
581 Collect Running Low 4 1/3 16s
583 FSAnalyze Running Low 5 1/10 1s
———————————————————
Total: 3
In this case, there are three jobs running: SmartPools with a priority of 6, MultiScan with priority 4, and FSAnalyze with priority 5.
Next, a deduplication job is started, with a priority value of 4:
# isi job start dedupe
Started job [584]
Looking at the cluster’s job status shows that SmartPools job has been put into a waiting state (paused), because of its relative priority. A value of ‘1’ indicates the highest priority job level that OneFS supports, with ‘10’ being the lowest.
# isi job status
The job engine is running.
Running and queued jobs:
ID Type State Impact Pri Phase Running Time
———————————————————
578 SmartPools Waiting Low 6 1/2 11s
581 Collect Running Low 4 1/3 1m 4s
583 FSAnalyze Running Low 5 9/10 43s
584 Dedupe Running Low 4 1/1 –
———————————————————
Total: 4
Once the FSAnalyze job has completed, the SmartPools job is automatically restarted again:
# isi job status
The job engine is running.
Running and queued jobs:
ID Type State Impact Pri Phase Running Time
———————————————————
578 SmartPools Running Low 6 1/2 23s
581 Collect Running Low 4 1/3 5m 9s
584 Dedupe Running Low 4 1/1 1m 2s
———————————————————
Total: 3
Let’s look at this in a bit more detail. So the Job Engine’s concurrent job execution is governed by the following criteria:
- Job Priority
- Exclusion Sets – jobs which cannot run together (ie, FlexProtect and AutoBalance)
- Cluster health – most jobs cannot run when the cluster is in a degraded state.
There are two exclusion sets that jobs can be part of:
- Marking Exclusion Set
- Restriping Exclusion Set

Here’s are the basic concurrent job combinations that OneFS supports:
- 1 Restripe Job + 1 Mark Job + 1 Other Job
- 1 Restripe Job + 2 Other Jobs
- 1 Mark Job + 2 Other Jobs
- 1 Mark and Restripe Job + 2 Other Jobs
- 3 Other Jobs
OneFS marks blocks that are actually in use by the file system. IntegrityScan, for example, traverses the live file system, marking every block of every LIN in the cluster to proactively detect and resolve any issues with the structure of data in a cluster. The jobs that comprise the marking exclusion set are:
Job Name
|
Job Description
|
Access Method
|
Collect
|
Reclaims disk space that could not be freed due to a node or drive being unavailable while they suffer from various failure conditions.
|
Drive + LIN
|
IntegrityScan
|
Performs online verification and correction of any file system inconsistencies.
|
LIN
|
MultiScan
|
Runs Collect and AutoBalance jobs concurrently.
|
LIN
|
OneFS protects data by writing file blocks across multiple drives on different nodes in a process known as ‘restriping’. The Job Engine defines a restripe exclusion set that contains these jobs which involve file system management, protection and on-disk layout. The restripe exclusion set contains the following jobs:
Job Name
|
Job Description
|
Access Method
|
AutoBalance
|
Balances free space in the cluster.
|
Drive + LIN
|
AutoBalanceLin
|
Balances free space in the cluster.
|
LIN
|
FlexProtect
|
Rebuilds and re-protects the file system to recover from a failure scenario.
|
Drive + LIN
|
FlexProtectLin
|
Re-protects the file system.
|
LIN
|
MediaScan
|
Scans drives for media-level errors.
|
Drive + LIN
|
MultiScan
|
Runs Collect and AutoBalance jobs concurrently.
|
LIN
|
SetProtectPlus
|
Applies the default file policy. This job is disabled if SmartPools is activated on the cluster.
|
LIN
|
ShadowStoreProtect
|
Protect shadow stores which are referenced by a LIN with higher requested protection.
|
LIN
|
SmartPools
|
Job that runs and moves data between the tiers of nodes within the same cluster. Also executes the CloudPools functionality if licensed and configured.
|
LIN
|
Upgrade
|
Manage OneFS upgrades.
|
LIN
|
Note that in OneFS 8.0 and beyond, the restriping exclusion set is per-phase instead of per job. This helps to more efficiently parallelize restripe jobs when they don’t need to lock down resources.
Restriping jobs only block each other when the current phase performs restriping. This is most evident with MultiScan, whose final phase only sweeps rather than restripes. Similarly, MediaScan, which rarely ever restripes, is usually able to run to completion more without contending with other restriping jobs.
For example, below the two restripe jobs, MediaScan and AutoBalananceLin, are both running their respective first job phases. ShadowStoreProtect, also a restriping job, is in a ‘waiting’ state, blocked by AutoBalanceLin:
Running and queued jobs:
ID Type State Impact Pri Phase Running Time
———————————————————————-
26850 AutoBalanceLin Running Low 4 1/3 20d 18h 19m
26910 ShadowStoreProtect Waiting Low 6 1/1 –
28133 MediaScan Running Low 8 1/8 1d 15h 37m
———————————————————————-
MediaScan restripes in phases 3 and 5 of the job, and only if there are disk errors (ECCs) which require data reprotection. If MediaScan reaches phase 3 with ECCs, it will pause until AutoBalanceLin is no longer running. If MediaScan’s priority were in the range 1-3, it would cause AutoBalanceLin to pause instead.
If two jobs happen to reach their restriping phases simultaneously and the jobs have different priorities, the higher priority job (ie. priority value closer to “1”) will continue to run, and the other will pause. If the two jobs have the same priority, the one already in its restriping phase will continue to run, and the one newly entering its restriping phase will pause.
Jobs may also belong to both exclusion sets. An example of this is MultiScan, since it includes both AutoBalance and Collect.
However, the bulk of OneFS’ jobs do not belong to an exclusion set, as illustrated in the following graphic. These are typically the feature support jobs, as described above, and they can coexist and contend with any of the other jobs.
Exclusion sets do not change the scope of the individual jobs themselves, so any runtime improvements via parallel job execution are the result of job management and impact control. The Job Engine monitors node CPU load and drive I/O activity per worker thread every twenty seconds to ensure that maintenance jobs do not cause cluster performance problems.
If a job affects overall system performance, Job Engine reduces the activity of maintenance jobs and yields resources to clients. Impact policies limit the system resources that a job can consume and when a job can run. You can associate jobs with impact policies, ensuring that certain vital jobs always have access to system resources.
Looking at the previous example again, where the SmartPools job is paused when FSAnalyze is running:
# isi job list
ID Type State Impact Pri Phase Running Time
———————————————————
578 SmartPools Waiting Low 6 1/2 15s
581 Collect Running Low 4 1/3 37m
584 Dedupe Running Low 4 1/1 33m
586 FSAnalyze Running Low 5 1/10 9s
———————————————————
Total: 4
If this is undesirable, FSAnalyze can be manually paused, to allow SmartPools to run unimpeded:
# isi job pause FSAnalyze
# isi job list
ID Type State Impact Pri Phase Running Time
————————————————————-
578 SmartPools Waiting Low 6 1/2 15s
581 Collect Running Low 4 1/4 38m
584 Dedupe Running Low 4 1/1 34m
586 FSAnalyze User Paused Low 5 1/10 20s
————————————————————-
Total: 4
Alternatively, the priority of the SmartPools job can also be elevated to value ‘4’ (or the priority of FSAnalyze lowered to value ‘7’) to permanently prioritize it over the FSAnalyze job.
For example:
# isi job types modify SmartPools –priority 4
Are you sure you want to modify the job type SmartPools? (yes/[no]): yes
# isi job types view SmartPools
ID: SmartPools
Description: Enforce SmartPools file policies. This job requires a SmartPools license.
Enabled: Yes
Policy: LOW
Schedule: every day at 22:00
Priority: 4
Or, via the webUI:

Navigate to Job Operations > Job Type and configure the desired priority value by editing the appropriate job type’s details, as above: