Isilon: If the Smartconnect Service IP (SSIP) is assigned to an aggregate interface, the IP address may go missing under certain conditions or move to another node if one of the laggports is shutdown.

Article Number: 519890 Article Version: 13 Article Type: Break Fix



Isilon,Isilon OneFS 8.0.0.6,Isilon OneFS 8.0.1.2,Isilon OneFS 8.1.0.2

The Smartconnect SSIP or network connectivity could be disrupted in a node if link aggregation interface in LACP mode is configured, and one of the port members in the lagg interface stops participating from the LACP aggregation.

Issue happens when a node is configured with any of the link aggregation interfaces:

10gige-agg-1

ext-agg-1

And one of its port members is not participating into the lagg interface:

lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>

ether 00:07:43:09:3c:77

inet6 fe80::207:43ff:fe09:3c77%lagg0 prefixlen 64 scopeid 0x8 zone 1

inet 10.25.58.xx netmask 0xffffff00 broadcast 10.25.58.xxx zone 1

nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

media: Ethernet autoselect

status: active

laggproto lacp lagghash l2,l3,l4

laggport: cxgb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

>> laggport: cxgb1 flags=0<>

This will cause OneFS to internally set the link aggregation interface to ‘No Carrier’ status, due to a bug in network manager software (Flexnet):

# isi network interface list

LNN Name Status Owners IP Addresses

————————————————————————–

1 10gige-1 No Carrier – –

1 10gige-2 Up – –

1 10gige-agg-1 No Carrier groupnet0.subnet10g.pool10g 10.25.58.46

Possible failures causing the issue:

  1. Failed switch port
  2. Incorrect LACP configuration at switch port
  3. Bad cable/SFP, or other physical issue
  4. A connected switch to a port was failed, or rebooted
  5. BXE driver bug reporting not full duplex in a port state (KB511208)

Failures 1 to 4, are external to the cluster, and issue should go away as soon as these gets fixed. Failure 5 could be a persistent failure induced by a known OneFS-BXE bug(KB 511208).

  1. If node is lowest node id in pool, and Smartconnect SSIP is configured there, then:
    1. If failure 1,2, or 3 happen, then the SSIP will be moved to next lowest node id that is clear from any failure
    2. If failure 4 is present, then the SSIP will not be available in any node, and DU is expected until workaround is implemented, patch is installed, or switch is fixed or gets available again after a reboot.
    3. If failure 5 is present:
      1. If only one port is failed, then SSIP will move to next available lowest node id not affected by the issue
      2. [DU] If all nodes in a cluster are BXE nodes, and all are affected by the bug, the SSIP will not be available, expect DU, until workaround or patch is applied.
  2. If the link aggregation in LACP mode is configured in a subnet-pool where its defined gateway is the default route in the node, then:
  1. If issue happens when node is running and default route is already set, then the default route will be continue configured and available, connectivity to already connected clients should continue working.
  2. [DU] If node is rebooted with any of the persistent failures, after it gets back up after the reboot, the default router will not be available, causing DU until external issue is fixed, workaround applied, or patch installed.

If during upgrade to 8.0.0.6 or 8.1.0.2 any of the failures is present, then after the rolling reboot a DU is expected due to case described in cause A->c->ii, or cause B->b. A check must be made prior to the upgrade to evaluate you are clear from any of the described failures.



Workaround


Workaround to immediately restore link aggregation interface if only one member port is persistently down (Failed switch, failed cable/SFP, BXE bug, or other persistent issue)

Step 1:

Identify failed member port on link aggregation interface:

# ifconfig

lagg1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>

ether 00:0e:1e:58:20:70

inet6 fe80::20e:1eff:fe58:2070%lagg1 prefixlen 64 scopeid 0x8 zone 1

inet 172.16.240.xxx netmask 0xffff0000 broadcast 172.16.255.xxx zone 1

nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

media: Ethernet autoselect

status: active

laggproto lacp lagghash l2,l3,l4

>> laggport: bxe1 flags=0<>

laggport: bxe0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>


Step 2:

Manually remove port member with command:

ifconfig lagg1 -laggport bxe1

Network should be recovered in 10-20 seconds, after executing the command.

This change will be lost after a reboot.

After the external failure in a port has been identified and fixed, and port is again available, reconfigure

port back into link aggregation configuration with command:

ifconfig lagg1 laggport bxe1

A permanent fix will be available in the following OneFS maintenance releases once they become available:

  • OneFS 8.0.0.7
  • OneFS 8.1.0.4

Roll-Up patch is now available for:

8.0.0.6 (bug 226984) – patch-226984

8.1.0.2 (bug 226323) – patch-226323

NOTE: This issue affects the following OneFS versions ONLY:

  • OneFS 8.0.0.6
  • OneFS 8.0.1.2
  • OneFS 8.1.0.2
  • OneFS 8.1.1.1

Related:

Isilon Gen6: Addressing Generation 6 Battery Backup Unit (BBU) Test Failures[2]

Article Number: 518165 Article Version: 7 Article Type: Break Fix



Isilon Gen6,Isilon H400,Isilon H500,Isilon H600,Isilon A100,Isilon A2000,Isilon F800

Gen6 nodes may report spurious Battery Backup Unit (BBU) failures similar to the following:

Battery Test Failure: Replace the battery backup unit in chassis <serial number> slot <number> as soon as possible.

Issues were identified with both the OneFS battery test code, and the battery charge controller (bcc) firmware, that can cause these spurious errors to be reported.

The underlying causes for most spurious battery test failures have been resolved in OneFS 8.1.0.4 and newer, and Node Firmware Package 10.1.6 and newer (DEbcc/EPbcc v 00.71); to resolve this issue, please upgrade to these software versions, in that order, as soon as possible. In order to perform these upgrades and resolve this issue, the following steps are required:

Step 1: check the BBU logs for a “Persistent fault” message. This indicates a test failure state that cannot be cleared in the field. Run the following command on the affected node:

# isi_hwmon -b |grep “Battery 1 Status”


If the battery reports a Persistent Fault condition, gather and upload logs using the isi_gather_info command, then contact EMC Isilon Technical Support and reference this KB.

Step 2: Clear the erroneous battery test result by running the following commands:

# isi services isi_hwmon disable

# mv /var/log/nvram.xml /var/log/nvram.xml.old

Step 3: Clear the battery test alert and unset the node read-only state so the upgrade can proceed:

– Check ‘isi event events list’ to get the event ID for the HW_INFINITY_BATTERY_BACKUP_FAULT event. Then run the following commands:

# isi event modify <eventid> –resolved true

# /usr/bin/isi_hwtools/isi_read_only –unset=system-status-not-good

Step 4: Upgrade OneFS to 8.1.0.4 or later

Instructions for upgrading OneFS can be found in the OneFS Release Notes on the support.emc.com web site.

Step 5: Update node firmware using Node Firmware Package 10.1.6 or later

Instructions for upgrading node firmware can be found in the Node Firmware Package Release Notes on the support.emc.com web site.

Once the system is upgraded, no further spurious battery replacement alerts should occur.

If an OneFS upgrade to 8.1.0.4 or newer is not an option at this time, or if the system generates further battery failure alerts after upgrading, please contact EMC Isilon Technical Support for assistance, and reference this KB.

Related:

Isilon OneFS: Node compatibility class create fails when not all drives are HEALTHY

Article Number: 504582 Article Version: 3 Article Type: Break Fix



Isilon OneFS 8.1,Isilon OneFS 8.0,Isilon OneFS

Creating a node compatibility class fails if not all drives are HEALTHY and causes the process isi_smartpools_d to fail to start. That results in the event:

Process isi_smartpools_d of service isi_smartpools_d has failed to restart after multiple attempts

Running ‘isi status -p’ will contain the following:

Diskpool status temporarily unavailable.

The following error is logged in /var/log/messages:

2017-09-08T11:30:59-06:00 <1.4> for-isi-b-1 isi_smartpools_d[5415]: Exception: : Traceback (most recent call last): File "/usr/bin/isi_smartpools_d", line 287, in <module> main() File "/usr/bin/isi_smartpools_d", line 80, in main run_as_daemon() File "/usr/bin/isi_smartpools_d", line 89, in run_as_daemon run_uncaught() File "/usr/bin/isi_smartpools_d", line 118, in run_uncaught conform_diskpool_db_to_drive_purpose() File "/usr/bin/isi_smartpools_d", line 163, in conform_diskpool_db_to_drive_purpose needs_write = dp_cfg.conform_provisioning_to_node_types(fp_cfg) File "/usr/local/lib/python2.6/site-packages/isi/smartpools/diskpools.py", line 1200, in conform_provisioning_to_node_types File "/usr/local/lib/python2.6/site-packages/isi/smartpools/diskpools.py", line 1335, in conform_diskpools_to_storage_units File "/usr/local/lib/python2.6/site-packages/isi/smartpools/diskpools.py", line 1094, in drive_to_storage_unit AssertionError 

A missing drive will cause the disk pool database to fail to update as OneFS is unable to allocate that bay to a disk pool.

  • Replace any drives in bays in REPLACE status and make sure all bays in the cluster are HEALTHY. Once all drives are HEALTHY the node compatibility class can be created successfully.
  • After creating the node compatibility class make sure the ‘Diskpool status temporarily unavailable’ message is no longer in the output of:

# isi status -p

  • Verify storagepool health and compatible nodes are now in the correct pools by running:

# isi storagepool health -v

Related:

Isilon OneFS: Can not enable ESRS when legacy ESRS setup is enabled?

Article Number: 504579 Article Version: 4 Article Type: Break Fix



Isilon OneFS 8.1.0,Isilon OneFS 8.1

ESRS shows as NOT enabled in WebUI but shows Enabled in CLI

WebUI is showing error about version of ESRS is not supported

The reason the webui is showing the error about version of ESRS is not supported is because customer has “legacy” configuration setup, and not the new configuration

dsgsc1-1# isi remotesupport connectemc view

Enabled: Yes

Primary Esrs Gateway: 10.64.xxx.xxx

Secondary Esrs Gateway: –

Use SMTP Failover: No

Email Customer On Failure: No

Gateway Access Pools: subnet0:pool0

dsgsc1-1# isi esrs view

Enabled: No

Primary ESRS Gateway: 10.64.xxx.xxx

Secondary ESRS Gateway: –

Alert on Disconnect: Yes

Gateway Access Pools: subnet0.pool0, subnet1.pool0, subnet1.SyncIQ-pool

Gateway Connectivity Check Period: 3600

License Usage Intelligence Reporting Period: 86400

Gateway Connectivity Status: Disconnected

Customer would first need to setup and license the new configuration, then disable “legacy” stuff and see if the error would go away.

See KB’s https://support.emc.com/kb/511053 and https://support.emc.com/kb/511087 for further troubleshooting info

Install of new Isilon Gen6

setup the new configuration

isi esrs modify –enabled 1

EMC username and password are required to enable ESRS

disable legacy configuration

isi remotesupport connectemc modify –enabled=false

Related:

  • No Related Posts

Isilon: Error Invalid User name and Password, trying to Enable ESRS on Gen6 OneFS v8.1.0.0 (User Correctable)

Article Number: 504577 Article Version: 3 Article Type: Break Fix



Isilon OneFS,Isilon OneFS 8.1

ESRS shows as NOT enabled in WebUI but shows Enabled in CLI

Webui is showing error about version of ESRS is not supported

The reason the webui is showing the error about version of ESRS is not supported is because customer has “legacy” configuration setup, and not the new configuration

dsgsc1-1# isi remotesupport connectemc view

Enabled: Yes

Primary Esrs Gateway: 10.64.xxx.xxx

Secondary Esrs Gateway: –

Use SMTP Failover: No

Email Customer On Failure: No

Gateway Access Pools: subnet0:pool0

dsgsc1-1# isi esrs view

Enabled: No

Primary ESRS Gateway: 10.64.xxx.xxx

Secondary ESRS Gateway: –

Alert on Disconnect: Yes

Gateway Access Pools: subnet0.pool0, subnet1.pool0, subnet1.SyncIQ-pool

Gateway Connectivity Check Period: 3600

License Usage Intelligence Reporting Period: 86400

Gateway Connectivity Status: Disconnected

Customer would first need to setup the new configuration, then disable “legacy” stuff and see if the error is gone.

Install of new Isilon Gen6

isi esrs modify –enabled 1

EMC username and password are required to enable ESRS

Related:

Isilon: SyncIQ’s resync-prep job can fail with error “Authentication with target failed” if the policy’s Force Interface flag is set to yes

Article Number: 503031 Article Version: 3 Article Type: Break Fix



Isilon,Isilon SyncIQ,Isilon OneFS

SyncIQ’s resync-prep job can fail with error “Authentication with target failed” even when we do not have a password configured for the sync policy.

Errors: Authentication with target failed, (policy name: test target: 10.118.160.38) SyncIQ policy failed. Authentication with target failed

However the policy’s Force Interface flag is set to yes.

ID: 51d400a9b44decc9f6c562bf2dea05b4

Name: test

Path: /ifs/data/test_other

Action: sync

Enabled: Yes

Target: 10.118.160.38

Source Subnet: subnet0

Source Pool: pool0


Workers Per Node: 3

Report Max Age: 1Y

Report Max Count: 2000

Force Interface: Yes

You can see following messages if you put the migrate logs in trace mode during resync-prep:

Messages on Primary Cluster:

orca7215-1: 2010-01-24T09:42:13Z <3.7> orca7215-1(id2) isi_migrate[81122]: primary[test:1264326133]: Sending AUTH_MSG to sworker

orca7215-1: 2010-01-24T09:42:13Z <3.7> orca7215-1(id2) isi_migrate[81120]: primary[test:1264326133]: Generated secret of ed45e4c673a2d0bed579a568add169f0

orca7215-1: 2010-01-24T09:42:13Z <3.7> orca7215-1(id2) isi_migrate[81122]: primary[test:1264326133]: Received SWORKER_STF_STAT_MSG from sworker

orca7215-1: 2010-01-24T09:42:13Z <3.7> orca7215-1(id2) isi_migrate[81122]: primary[test:1264326133]: sworker_stat_callback

orca7215-1: 2010-01-24T09:42:13Z <3.7> orca7215-1(id2) isi_migrate[81122]: primary[test:1264326133]: Sending SWORKER_STF_STAT_MSG to coord

orca7215-1: 2010-01-24T09:42:13Z <3.7> orca7215-1(id2) isi_migrate[81121]: primary[test:1264326133]: Generated secret of ed45e4c673a2d0bed579a568add169f0

orca7215-1: 2010-01-24T09:42:13Z <3.7> orca7215-1(id2) isi_migrate[81121]: primary[test:1264326133]: Sending AUTH_MSG to sworker

orca7215-1: 2010-01-24T09:42:13Z <3.7> orca7215-1(id2) isi_migrate[81122]: primary[test:1264326133]: Received NOOP_MSG from sworker

orca7215-1: 2010-01-24T09:42:13Z <3.7> orca7215-1(id2) isi_migrate[81122]: primary[test:1264326133]: Received ERROR_MSG from sworker

orca7215-1: 2010-01-24T09:42:13Z <3.7> orca7215-1(id2) isi_migrate[81122]: primary[test:1264326133]: Sending ERROR_MSG to coord

orca7215-1: 2010-01-24T09:42:13Z <3.6> orca7215-1(id2) isi_migrate[81122]: primary[test:1264326133]: Disconnect from sworker


Messages on Secondary Cluster:

2017-02-03T03:55:56Z <3.7> Halfpipe8011-1 isi_migrate[61917]: secondary[test:1264326133]: /ifs/.ifsvar/modules/tsm/passwd missing or bad, no authentication

2017-02-03T03:55:56Z <3.7> Halfpipe8011-1 isi_migrate[61917]: secondary[test:1264326133]: Sending AUTH_MSG to pworker

2017-02-03T03:55:56Z <3.7> Halfpipe8011-1 isi_migrate[61918]: secondary[test:1264326133]: Generated secret of 64752e5bd6f05f3c883009e006ac5cf1

2017-02-03T03:55:56Z <3.7> Halfpipe8011-1 isi_migrate[61918]: secondary[test:1264326133]: Received AUTH_MSG from pworker

2017-02-03T03:55:56Z <3.7> Halfpipe8011-1 isi_migrate[61918]: secondary[test:1264326133]: auth_callback

2017-02-03T03:55:56Z <3.7> Halfpipe8011-1 isi_migrate[61918]: secondary[test:1264326133]: Sending ERROR_MSG to pworker

2017-02-03T03:55:56Z <3.7> Halfpipe8011-1 isi_migrate[61918]: secondary[test:1264326133]: 7 bytes left to be sent for NOOP_MSG

2017-02-03T03:55:56Z <3.2> Halfpipe8011-1 isi_migrate[61918]: secondary[test:1264326133]: Primary authentication failed

2017-02-03T03:55:56Z <3.2> Halfpipe8011-1 isi_migrate[61918]: Exiting on ilog(IL_FATAL)

2017-02-03T03:55:56Z <3.7> Halfpipe8011-1 isi_migrate[61915]: secondary[test:1264326133]: siq_target_states_set: Policy test updating job state from SIQ_JS_RUNNING to SIQ_JS_FAILED

2017-02-03T03:55:56Z <3.6> Halfpipe8011-1 isi_migrate[90611]: bandwidth: Read socket error: Connection reset by peer

2017-02-03T03:55:56Z <3.6> Halfpipe8011-1 isi_migrate[61915]: secondary[test:1264326133]: Coord disconnects from target_monitor for test

2017-02-03T03:55:56Z <3.7> Halfpipe8011-1 isi_migrate[61915]: secondary[test:1264326133]: siq_target_states_set: Policy test updating job state from SIQ_JS_FAILED to SIQ_JS_FAILED

2017-02-03T03:55:56Z <3.7> Halfpipe8011-1 isi_migrate[61915]: secondary[test:1264326133]: siq_target_states_set: Policy test changing cancel state from CANCEL_WAITING_FOR_REQUEST to CANCEL_DONE

2017-02-03T03:57:52Z <3.6> Halfpipe8011-1 isi_migrate[61963]: secondary: Disconnect from pworker (5)

A resync-prep will attempt to connect to the local cluster over the internal IB network. This connection fails if the force interface flag is set to yes.

Flip the force-interface and retry resync-prep:

#isi sync policies modify –policy=<policyName> –force-interface=off

Related:

  • No Related Posts

Isilon: NFS export creation fails in WebUI in OneFS 8.0.0.4

Article Number: 496256 Article Version: 5 Article Type: Break Fix



Isilon,Isilon OneFS,Isilon OneFS 8.0

In OneFS 8.0.0.4, when using WebUI to create new NFS exports. The following error can be encountered.

NFS Export not created.

User-added image

The export did not create due to the following errors:map failure : Field: map_failure has error: Incorrect type. Found: string, schema accepts: object.map non_root : Field: map_non_root has error: Incorrect type. Found: string, schema accepts: object.map root : Field: map_root has error: Incorrect type. Found: string, schema accepts: object.security flavors : Field: security_flavors has error: Incorrect type. Found: string, schema accepts: array.Input validation failed. 
Editing existing exports through the WebUI will still function normally.

A product defect exists in OneFS 8.0.0.4 which prevents NFS export creation via WebUI.

Workaround:

When creating the export via the WebUI, select “Use Custom” for the following options in the exports creation screen:

  • Root User Mapping
  • Non-Root User Mapping
  • Failed User Mapping
  • Security Flavors

You do not need to change anything for each of those options, but they should not be set to “Use Default.”

Alternatively, the command line interface (CLI) can be used to create NFS exports in OneFS 8.0.0.4.

Permanent solution:

Apply Patch-191603 which can be downloaded from:

https://support.emc.com/downloads/15209_Isilon-OneFS

The fix will be included in future OneFS releases as well.

Related:

OneFS: Best practices for NFS client settings[1]

Article Number: 457328 Article Version: 8 Article Type: Break Fix



Isilon,Isilon OneFS

This article describes the best practices and recommendations for client-side settings and mount options when using the NFS protocol to connect to an Isilon cluster and applies to all currently supported versions of OneFS.

Supported Protocol Versions

At this time Isilon OneFS supports NFS versions 3 and 4. NFS version 2 has not been supported since the move to the 7.2.X code family.

NFSv3

NFS version 3 is the most widely used version of the NFS protocol today, and is generally considered to have the widest client and filer adoption. Here are key components of this version:

  • Stateless – A client does not technically need to establish a new session if it has the correct information to ask for files, etc. This allows for simple failover between OneFS nodes via dynamic IP pools.
  • User and Group info is presented numerically – Client and Server communicate user information by numeric identifiers, allowing the same user to possible appear as different names between client and server.
  • File Locking is out of band – Version 3 of NFS uses a helper protocol called NLM to perform locks. This requires the client to respond to RPC messages from the server to confirm locks have been granted, etc.
  • Can run over TCP or UDP – This version of the protocol can run over UDP instead of TCP, leaving handling of loss and retransmission to the software instead of the operating system. We always recommend using TCP.

NFSv4

NFS version 4 is the newest major revision of the NFS protocol, and is increasing in adoption. At this time NFSv4 is generally less performant than v3 against the same workflow due to the greater amount of identity mapping and session tracking work required to reply. Here are some of the key differences between v3 and v4

  • Stateful – NFSv4 uses sessions in order to handle communication, as such both client and server need to track session state to continue communicating.
    • Prior to OneFS 8.X this meant that NFSv4 clients required static IP pools on the Isilon or could encounter issues.
  • User and Group info is presented as strings – Both the client and server need to resolve the names of the numeric information stored. The server needs to lookup names to present, while the client needs to remap those to numbers on its end.
  • File Locking is in band – Version 4 no longer users a separate protocol for file locking, instead making it a type of call that is usually compounded with OPENs, CREATES, or WRITES.
  • Compound Calls – Version 4 can bundle a series of calls in a single packet, allowing the server to process all of them and reply at the end. This is used to reduce the number of calls involved in common operations.
  • Only supports TCP – Version 4 of NFS has left loss and retransmission up to the underlying operating system.

NFSv4.1 and Beyond

At this time OneFS does not support NFS version 4.1. If you need specific features of version 4.1, speak with your account team to see if that is something that we can provide via OneFS’s unique featureset as an NFS filer.

OneFS Version specific concerns

For customers that have been using Isilon OneFS since versions 7.1 or before, changes made in the 7.2.0 version of OneFS, and remaining in place until OneFS 8.1.1, might impact how clients using encoding that differs from the cluster’s are able to view and interact with directory listings. For more details review ETA 483840.

This is not an issue if you began using OneFS on version 7.2 or beyond.

Mount options

While we do not have hard requirements for mount options, we do make some recommendations on how clients connection. We have not provided specific mount strings, as the syntax used to define these options varies depending on the operating system in use. You should refer to your distribution maintainers documentation for specific mount syntax.

Defining Retries and Timeouts

While the Isilon generally replies to client communication very quickly, during instances when a node has lost power or network connectivity, it might take a few seconds for its IP addresses to move to a functional node, as such it is important to have correctly defined timeout and retry values. Isilon generally recommends a timeout of 60 seconds to account for a worst case failover scenario, set to retry two times before reporting a failure.

Soft vs Hard Mounts

Hard mounts cause the client to retry its operations indefinitely on timeout or error. This ensures that the client does not disconnect the mount in circumstances where the Isilon cluster moves IP addresses from one node to another. A soft mount will instead error out and expire the mount requiring a remount to restore access after the IP address moves.

Allowing interrupt

By default, most clients do no allow you to interrupt an input/output or I/O wait, meaning you cannot use ctrl+c, etc, to end the waiting process if the cluster is hanging, including the interrupt mount option allows those signals to pass normally instead.

Local versus Remote Locking

When mounting an NFS export, you can specify whether a like will perform its locks locally, or using the lock co-ordinator on the cluster. Most clients default to remote locking, and this is generally the best option when multiple clients will be accessing the same directory, however there can be performance benefits to performing local locking when a client does not need to share access to the directory it is working with. In addition, some databases and softwares will request you use local locking, as they have their own coordinator.

Related:

Isilon OneFS 8.0: Cannot install firmware package: /var/patch/backup: is not a directory

Article Number: 500573 Article Version: 9 Article Type: Break Fix



Isilon OneFS 8.0,Isilon OneFS 8.1

During firmware package installation, installation would not complete, and output such as:

# isi upgrade patches list

Patch Name Description Status

———————————————————————————————————————————-

IsiFw_Package_v9.3.5 Package Name : IsiFw Package v9.3.5 2017-04-04 To… Installing

would show Status as Installing indefinitely.

Look at last few lines from /var/log/isi_pkg of each node, and you should see the following error:

# isi_for_array -s ‘tail -50 /var/log/isi_pkg’

………

2017-05-28T13:31:56-05:00 <3.6> node-1 isi_pkg[16306]: Starting task for request ‘INSTALL’, task ‘INSTALL_INIT’, hash ‘2693c810637d37a02ee9cb57a9d01d3d’

2017-05-28T13:31:59-05:00 <3.6> node-1 isi_pkg[16306]: Running requirements file first for IsiFw_Package_v9.3.5..

2017-05-28T13:31:59-05:00 <3.3> node-1 isi_pkg[16306]: /var/patch/backup: is not a directory.

2017-05-28T13:31:59-05:00 <3.3> node-1 isi_pkg[16306]: install_init: Task encountered unknown failures


This issue could happen for any of the firmware packages such as IsiFw_Package_v9.3.5.tar or IsiFw_Package_v10.0.1.tar

Partition mounted under /var requires at least 40% of free disk space to allow installation of large patch such as firmware package. If you don’t have enough free disk space under /var, the node would fail the installation with error:

/var/patch/backup: is not a directory

Reduce disk usage under /var. It is usually /var/log directory that would contain large files that might need to be truncated. To determine that:

1. Determine which node does not have at least 40% (Capacity should be 60% or less) free space under /var

# isi_for_array -s ‘df -h /var’

2. ssh to the node as root user that does not have enough free space under /var

# cd /var/log

# du ./* | sort -n -r | head -n 10

3. Make a backup directory within /ifs/data/Isilon_Support/ to store a backup of the files that will be truncated:

# mkdir /ifs/data/Isilon_Support/Node<node number>_VarLog_Backup

4. Copy the files that you will be truncating to the new backup directory:

# cp <filename> /ifs/data/Isilon_Support/Node<node number>_VarLog_Backup

5. The output of step 2 would give you top 10 largest files under /var/log. You can truncate the files using command:

# truncate -s 0 <filename>

6. Check partition again to make sure you have freed up at least 40% (Capacity should be 60% or less) of disk space under /var:

# isi_for_array -s ‘df -h /var’

7. Once you have freed up enough capacity, retry the failed upgrade via command:

# isi upgrade retry-last-action –nodes=<lnn of failed node>

8. Allow 30min to complete upgrade process, and check to make sure the package is showing Installed for Status:

# isi upgrade patches list

Patch Name Description Status

——————————————————————————————————————

IsiFw_Package_v10.0.1 Package Name : IsiFw Package v10.0.1 2017-05-02 T… Installed

If you don’t know if the files can be safely truncated, or if there are no large files under /var/log but /var partition is still too full, or retry-last-action does not complete patch installation, please engage Technical Support for assistance and refer this KB as reference.

Related:

OneFS 8.x: ‘isi smb openfiles/sessions’ is not zone aware

Article Number: 497099 Article Version: 7 Article Type: Break Fix



Isilon OneFS,Isilon OneFS 8.0,Isilon OneFS 8.1

Administrators are unable to list open files and sessions for non-System Access Zones, and subsequently unable to delete/close the open files and/or sessions through use of the following commands:

# isi smb openfiles list# isi smb openfiles close# isi smb sessions list# isi smb sessions delete

The commands are also ineffective when preceded by isi_run -z <zid>, which may have worked in previous 7.x releases.

Beginning in OneFS 8.0.0, the ‘isi smb sessions‘ and ‘isi smb openfiles‘ commands were changed to use PAPI (Platform Application Programming Interface), and are restricted to the System zone, as they lack the ‘–zone‘ option like many other commands that are “zone aware”.

Workaround for specific zones:

  • Using MMC by connecting to the relevant Access Zone. Refer to the OneFS Web Administration Guide (page 197) on configuring and using MMC for SMB share management
  • Using isi_classic (non-PAPI) in combination with isi_run:
NOTE: <zid> is the zone ID from ‘isi zone zones view <zone>‘ or ‘isi zone zones list -v
  1. isi_run -z<zid> isi_classic smb file [list|close]
    # isi_run -z4 isi_classic smb file list File [1] ID :35 Pathname :C:ifsdatatest Username :user Number of locks :0 Permissions :0x1# isi_run -z4 isi_classic smb file close --file-id=35File with ID [35] closed successfully
  2. isi_run -z<zid> isi_classic smb session list
    # isi_run -z4 isi_classic smb session listSession [1] Computer :192.168.0.108 Username :user Client type :DOS LM 2.0 Number of opens :2 Active time :800 Idle time :14 Guest login :no Encryption :no

In the case of searching for files is necessary, you can append the ‘list’ command with ` | grep -B1 “<file>”` to obtain the file ID:

# isi_run -z4 isi_classic smb file list | egrep -B1 test ID :16 Pathname :C:ifsdatatest

To remove a session from a non-System zone:

isi_run -z <zid> /usr/likewise/bin/lwnet session //<ipaddr> delete

Resolution and fix:

  • Beginning in the following releases, the ‘isi smb openfiles/sessions list’ commands will list and display openfiles and sessions from ALL zones by default (although a –zone parameter still does not exist):
    • 8.0.0.5
    • 8.0.1.2
    • 8.1.0.1
  • A more permanent fix to include a ‘–zone’ parameter is being investigated, but will not be released in any MR; there is currently no targeted release version of OneFS

Note: To invoke the above commands using isi_for_array(to run the command on all nodes), prepend isi_for_array

# isi_for_array ‘isi_run -z <zid> [command]’

Isilon Engineering is aware of the issue, and are working toward a fix. The KB will be updated when a fix is available with release information.

Related: