Isilon: If the Smartconnect Service IP (SSIP) is assigned to an aggregate interface, the IP address may go missing under certain conditions or move to another node if one of the laggports is shutdown.

Article Number: 519890 Article Version: 13 Article Type: Break Fix

Isilon,Isilon OneFS,Isilon OneFS,Isilon OneFS

The Smartconnect SSIP or network connectivity could be disrupted in a node if link aggregation interface in LACP mode is configured, and one of the port members in the lagg interface stops participating from the LACP aggregation.

Issue happens when a node is configured with any of the link aggregation interfaces:



And one of its port members is not participating into the lagg interface:


ether 00:07:43:09:3c:77

inet6 fe80::207:43ff:fe09:3c77%lagg0 prefixlen 64 scopeid 0x8 zone 1

inet 10.25.58.xx netmask 0xffffff00 broadcast zone 1


media: Ethernet autoselect

status: active

laggproto lacp lagghash l2,l3,l4

laggport: cxgb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

>> laggport: cxgb1 flags=0<>

This will cause OneFS to internally set the link aggregation interface to ‘No Carrier’ status, due to a bug in network manager software (Flexnet):

# isi network interface list

LNN Name Status Owners IP Addresses


1 10gige-1 No Carrier – –

1 10gige-2 Up – –

1 10gige-agg-1 No Carrier groupnet0.subnet10g.pool10g

Possible failures causing the issue:

  1. Failed switch port
  2. Incorrect LACP configuration at switch port
  3. Bad cable/SFP, or other physical issue
  4. A connected switch to a port was failed, or rebooted
  5. BXE driver bug reporting not full duplex in a port state (KB511208)

Failures 1 to 4, are external to the cluster, and issue should go away as soon as these gets fixed. Failure 5 could be a persistent failure induced by a known OneFS-BXE bug(KB 511208).

  1. If node is lowest node id in pool, and Smartconnect SSIP is configured there, then:
    1. If failure 1,2, or 3 happen, then the SSIP will be moved to next lowest node id that is clear from any failure
    2. If failure 4 is present, then the SSIP will not be available in any node, and DU is expected until workaround is implemented, patch is installed, or switch is fixed or gets available again after a reboot.
    3. If failure 5 is present:
      1. If only one port is failed, then SSIP will move to next available lowest node id not affected by the issue
      2. [DU] If all nodes in a cluster are BXE nodes, and all are affected by the bug, the SSIP will not be available, expect DU, until workaround or patch is applied.
  2. If the link aggregation in LACP mode is configured in a subnet-pool where its defined gateway is the default route in the node, then:
  1. If issue happens when node is running and default route is already set, then the default route will be continue configured and available, connectivity to already connected clients should continue working.
  2. [DU] If node is rebooted with any of the persistent failures, after it gets back up after the reboot, the default router will not be available, causing DU until external issue is fixed, workaround applied, or patch installed.

If during upgrade to or any of the failures is present, then after the rolling reboot a DU is expected due to case described in cause A->c->ii, or cause B->b. A check must be made prior to the upgrade to evaluate you are clear from any of the described failures.


Workaround to immediately restore link aggregation interface if only one member port is persistently down (Failed switch, failed cable/SFP, BXE bug, or other persistent issue)

Step 1:

Identify failed member port on link aggregation interface:

# ifconfig

lagg1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500


ether 00:0e:1e:58:20:70

inet6 fe80::20e:1eff:fe58:2070%lagg1 prefixlen 64 scopeid 0x8 zone 1

inet netmask 0xffff0000 broadcast zone 1


media: Ethernet autoselect

status: active

laggproto lacp lagghash l2,l3,l4

>> laggport: bxe1 flags=0<>


Step 2:

Manually remove port member with command:

ifconfig lagg1 -laggport bxe1

Network should be recovered in 10-20 seconds, after executing the command.

This change will be lost after a reboot.

After the external failure in a port has been identified and fixed, and port is again available, reconfigure

port back into link aggregation configuration with command:

ifconfig lagg1 laggport bxe1

A permanent fix will be available in the following OneFS maintenance releases once they become available:

  • OneFS
  • OneFS

Roll-Up patch is now available for: (bug 226984) – patch-226984 (bug 226323) – patch-226323

NOTE: This issue affects the following OneFS versions ONLY:

  • OneFS
  • OneFS
  • OneFS
  • OneFS


Leave a Reply