Isilon OneFS: SSD pool is nearing 100% or significantly more full than the associated HDD pool

Article Number: 521246 Article Version: 3 Article Type: Break Fix



Isilon,Isilon OneFS

Performance degradation due to SSDs filling in a workflow leveraging SSD strategies to improve performance, such as storing metadata mirrors or “hot” data on SSD.

SSD strategies other than L3 include possibility of filling SSDs faster than HDD. Nearing 100% full SSD can result in significant performance degradation. Note that “Full SSD” issues, subject of this KB, does not apply if all nodes with SSDs use them for L3 cache.

Required conditionsfor experiencing a performance issue:

– SSD storage using “isi status” must display “Used / Size” for at least some nodes:

– SSD fill ratio is higher than associated HDD ratio, for example HDD is 70% full while SSDs are 80% full.

If SSD capacity filled is about same as HDD, and a node pool or entire cluster is nearing capacity limits, please refer to

Isilon Customer Troubleshooting Guide: Troubleshoot a Full Pool or Cluster

https://support.emc.com/docu70014_Isilon_Customer_Troubleshooting_Guide:_Troubleshoot_a_Full_Pool_or_Cluster.pdf?language=en_US

Contributing factors for filling SSDs faster than hard disk drives (HDD)

  • Snapshots can be a significant contributing factor when
    • Directories are frequently snapped, as in many-per-hour, especially folders with high rate of change including ingest, delete or rename. These factors are indicated when SnapshotDelete job reports view shows large numbers of “LINs deleted”, such as a half million per job.
    • When any “active” directory, as in folder with multiple file changes-per-minute, has a SyncIQ policy schedule of “when-source-modified”
    • A large number of accumulated Snapshots, such as more than 40 saved on any given path, or five thousand or more at any one time.
    • Snap cleanup is affected by Snap Governance issue. Affects only certain versions of OneFS, refer to KB article: Large Snapshot Governance Lists can result in Allocation of Large Numbers of IFM Extension Blocks and Fill SSDs at https://support.emc.com/kb/520985
    • TreeDelete is often used in workflow on a directory protected with local or SyncIQ snapshots, and a large amount, such as over a half million LINs, are routinely deleted when TreeDelete runs, which means the deleted LINs will be added to the snapshot change list. When OneFS version is affected by Snapshot Governance list issue, snapshot governance metadata can retain LINs and grow faster than the LIN metadata removed by file deletion.
  • Default filepool policy, or custom filepool policies can affect significant percentage of files when
    • “Data SSD Strategy: metadata”, meaning write one copy of metadata to SSD.
    • “Data SSD Strategy: metadata-write”, meaning write all mirrors of cluster metadata to SSD
    • “Data SSD Strategy: data”, meaning write all file data to SSD (rare and ill-advised unless all or most storage is SSD)
    • “Snapshot SSD Strategy: metadata” meaning write one copy of SSD metadata to SSD
    • “Snapshot SSD Strategy: metadata-write”, meaning write all Snapshot metadata mirrors to SSD
    • “Snapshot SSD Strategy: data”, meaning write snapshot files to SSD (rare and ill-advised unless all or most storage is SSD)
    • Custom filepool rules applied to leverage SSD for specific “hot” folders or data as metadata-write or data target. Significance is not common, but when found often involves folder for which careful consideration was given before customized to improve application performance.
    • Other potential contributing factors to SSDs filling faster than HDD:

      • Small file profile. A significant percent of file sizes are below 128KB, meaning more “metadata” (LINs, aka file IDs) per volume of storage
      • Custom sysctls have been introduced to improve performance by storing mirrors such as system btrees, system delta mirrors, and quota accounting blocks. This is rare, and for clusters with over 2% ssd, these btrees altogether seldom total more than a percent or two of SSD.
      • Cluster storage volume includes less than 2% ssd, such as significant portion of nodes designed for nearline (NL) archive with little or no ssd.

    All Required Conditions must be present, plus any or combination of the Contributing factors.

    Commands and resources to discover and identify the main contributors to issue of SSDs filling faster than HDD:

    Syntax is OneFSv8.0.x, typically same or similar for OneFSv7.2.x and 8.1.x. Check CLI Admin Guide for syntax OneFS version.

    Command where/why used

    isi status -q identifies if nodes include “Used / Size” (potential for issue), or otherwise use SSD as L3 cache or have “No Storage SSDs”

    isi sync policies list -v |grep -vi target |egrep ‘Name:|Path:|Schedule:’ |paste – – – |tr -s ” “ for schedule frequency including any Scheduled as “when-source-modified”

    isi snapshot snapshots list displays accumulated snapshots,their SnapIDs, and totals

    isi job status lists recently completed SnapshotDelete, and TreeDelete jobs, including job ID

    isi job reports view <ID> to review LINs deleted and LINs total from a sample job ID

    isi filepool default-policy view shows default SSD strategies for data metadata and snapshot metadata

    isi filepool policies list -v shows custom policies with SSD strategy details

    isi storagepool list provides storagepool names on the cluster, used to view and modify filepool policies

    isi statisticsprotocol, client, heat, and system provide statistics on current workflow, for example to estimate protocol read:write ratio, and determine “top talkers” such as hottest (in Operations-per-second) files and folders. Use man isi-statistics for synopsis of information available.

    cat /etc/mcp/override/sysctl.conf displays persistent custom sysctl tuning. Any with “ssd” (e.g. efs.bam.layout.ssd) can mean adjustment has been made in SSD behavior. Please contact Dell Account team or Support for sysctl tuning guidance, and refer to KB 462759 – OneFS: Configuring sysctls and making sysctl changes persist through node and cluster reboots and upgrades, see https://support.emc.com/kb/462759

    • KB 520985 advises OneFS versions susceptible to Large Snapshots Governance Lists, see https://support.emc.com/kb/520985
    • InsightIQ FSAnalyze (FSA) data can show percent of “small” files, that is files under 128k.

    In order to build the most effective solution, identify as many key contributing factors that cause SSDs to fill as possible from above cause “discovery”.

    Objective:

    Goal is issue resolution using simple tactics with least negative performance impact. Risk is averted when SSD is within a few percent of HDD fill, assuming HDD is not also imprudently full.

    Action:

    Sample action plans below are ranked with most frequently useful scenarios at top. Depending on cause contributors identified, mix and match tactics to create an effective proactive action plan. Note that each time a File Pool Policy is changed, a SmartPools (or SetProtectPlus job) must be run for changes to take effect, which can take days.

    Scenario A: Many snaps contributing

    Many snapshots, frequent Snap/Sync schedules, SnapshotDelete or TreeDelete jobs often report >500k LINs deleted, default file pool policy for data is metadata (not metadata-write or data), OneFS version has Snap Governance issue.

    Recommended : Move Snapshot metadata to HDD, leaving non-snap cluster metadata in SSD.

    Procedure:

    A1) Set filepool default-SSD metadata strategy to “avoid”

    isi filepool default-policy modify –snapshot-ssd-strategy avoid

    A2) Run Smartpools (or SetProtectPlus if Smartpools is not licensed)

    isi job start smartpools

    Result: Moves Snapshot metadata to HDD. While tactic can reduce read/namespace performance on snapshots, it retains cluster metadata on SSD.


    Scenario B: Metadata mirrors are stored in SSD, and snapshot (meta)data unlikely to be significant factor

    Relatively few and/or infrequent snapshots, default file pool policy is metadata-write, custom filepool policies do not redirect significant amount to SSD.

    Recommended: Modify filepool default policy (and substantially large custom policies) to move metadata mirrors from SSD to HDD. Then with “spare” SSD (percent available below HDD filled-percent), create/modify custom filepool policies to restore metadata mirrors to SSD on only “hottest” write/change folders most likely to benefit, until SSD percent fill matches HDD. This tactic requires Smartpools license and benefits most from using isi statistics heat to determine top talker folders.

    Procedure:

    B1) Set filepool default data SSD metadata strategy to metadata

    isi filepool default-policy modify –data-ssd-strategy metadata

    B2) Run Smartpools job

    isi job start smartpools

    B3) Once SSD is below HDD fill percent after removing mirrors, use discrete filepool policies to leverage now available SSD for metadata write on folders with heavy write percentage. The following example creates a file pool policy leveraging SSD metadata-write on files in a “hot” directory /ifs/data/SQL/finance in local storage target named Performance_2.

    isi filepool policies create Save_SQL_Fin_Data –begin-filter –path=/ifs/data/SQL/finance –end-filter –data-access-pattern random –data-storage-target Performance_2 –data-ssd-strategy=metadata-write

    B4) Run Smartpools job

    isi job start smartpools

    B5) Rinse and repeat steps B4-B5 until SSD fill is within a few percent of HDD fill.

    Result: Moves metadata mirrors from SSD to HDD, leaving only one copy of metadata in SSD which could free up to 80% of the SSD capacity in situations where SSD metadata is the primary contributor.

    First step will likely reduce write performance while retaining benefits for read/namespace_read operations with one copy of metadata remaining in SSD.

    Subsequent steps leveraging spare capacity created on SSD to return to metadata-write strategy on hottest write/change folders, restoring much of the write performance lost during after the default-policy change while greatly reducing risk of SSDs reaching 100%.


    Scenario C: Many snaps AND metadata mirrors use SSD

    Combination of A and B, that is, many snapshots, and filepool default policy for data SSD is metadata-write

    Recommended: Use procedure A to remove snapshot data from SSDs first if snapshot are likely to be a significant factor. Check capacity status to see if problem is resolved, then use procedure B if needed. Cluster metadata for namespace reads typically benefit client performance more than snapshot metadata.


    Scenario D: Unresolved using above resolutions, and read workflow predominates over write

    Amount of SSD may be insufficient for all LINs if, for example, workflow is predominately small-files, and cluster mix includes NL-class nodes with little or no SSD. If none of above can bring SSD to safe, at or below HDD levels, consider converting SSDs to l3 cache to avoid overfilling and to leverage the SSD available to extend life of l2 cache.

    Recommended: Convert SSDs to l3 cache and start a Smartpools job.

    D1) Convert SSDs from metadata strategy to be used entirely for l3 cache.

    isi storagepool nodepools modify <storagepool name> –l3 true -f

    D1a) Optional: if namespace_read are predominant protocol operations, cluster has large percent of small files, and/or small amount of SSDs such as NL-class, adjust l3 to store only metadata and not data.

    Optional: add line to /etc/mcp/override/sysctl.conf: efs.l3:efs.l3.meta_only=1

    D2) Run Smartpools job

    isi job start smartpools

    Result: Moves all data and metadata from SSD storing it on HDD, and converts SSDs to an extension of l2 cache, re-populating it with data and metadata most recently expired from l2 cache.

    Using SSD for l3 cache can, depending on workflow, help performance. SSD l3 strategy tends to be better when client traffic is 70:30 or higher read:write ratio, and better yet when the same data is being repeatedly read by multiple clients. For example, if l2 is around 80% but has short time-to-live, meaning many of the l2 cache misses are due to rapid cache expiration, then using SSD for l3 will essentially extend life of l2 cached data and metadata. Optional configuration above of using l3 only for metadata will extend cache life of metadata at downside of not also saving l2-expired data in the l3 cache, making performance similar to SSD-metadata strategy with

Related:

Leave a Reply