Load Balanced Virtual Server Marked DOWN on Secondary Node of HA Pair

Since the SNIP is a “floating” IP address that is shared between the nodes, it can only be active on whichever node is primary. Since the SNIP is not active on the secondary node, monitor probes cannot be sent from the secondary node, and this results in the backend services and load balanced vServers to be marked DOWN. The virtual servers are marked UP on the primary node.

Related:

Can Symantec Endpoint Protection 14.2 be installed and work offline?

I do not need a solution (just sharing information)

Hi Everyone,

I would like to install Symantec Endpoint Protection on our computers in local network, but we mustn’t connect anyone to internet anytime now and later. Is it possible to install SEPM, deploy clients on other computers this way without any interferences? Can SEP work offline? Do we need to uninstall LiveUpdate while doing that or is it necessary to work with .jdb files and working with other nodes? Are there any other tips i need to know?
 

Thanks in advance for answers

0

Related:

Does Citrix Cloud Gateway (a.k.a Citrix Gateway Service) impacted with the vulnerability ( CVE-2019-19781)

Does Citrix Cloud Gateway (a.k.a Citrix Gateway Service) impacted with the vulnerability ( CVE-2019-19781) ?

– If the Gateway service nodes are being used to serve ICA, Web/SaaS traffic, then they are not impacted by this vulnerability, whereas the gateway service nodes serving the VPN and MFA are impacted and have been patched

Related:

How To Troubleshoot And Fix The Situation When The ADM HA Is Not Working

One of the possible error conditions reported in the deployment is where in the GUI System -> Deployment the following symptoms are reported:

Heartbeats are not received from the secondary

Data synchronization has failed on secondary


Apart of the information displayed in GUI on primary node, there may be the following further observations:

– the secondary node is not running

– the secondary node is running, but mas_hb_monit process is not running

Related:

Isilon Gen6: Addressing Generation 6 Battery Backup Unit (BBU) Test Failures[2]

Article Number: 518165 Article Version: 7 Article Type: Break Fix



Isilon Gen6,Isilon H400,Isilon H500,Isilon H600,Isilon A100,Isilon A2000,Isilon F800

Gen6 nodes may report spurious Battery Backup Unit (BBU) failures similar to the following:

Battery Test Failure: Replace the battery backup unit in chassis <serial number> slot <number> as soon as possible.

Issues were identified with both the OneFS battery test code, and the battery charge controller (bcc) firmware, that can cause these spurious errors to be reported.

The underlying causes for most spurious battery test failures have been resolved in OneFS 8.1.0.4 and newer, and Node Firmware Package 10.1.6 and newer (DEbcc/EPbcc v 00.71); to resolve this issue, please upgrade to these software versions, in that order, as soon as possible. In order to perform these upgrades and resolve this issue, the following steps are required:

Step 1: check the BBU logs for a “Persistent fault” message. This indicates a test failure state that cannot be cleared in the field. Run the following command on the affected node:

# isi_hwmon -b |grep “Battery 1 Status”


If the battery reports a Persistent Fault condition, gather and upload logs using the isi_gather_info command, then contact EMC Isilon Technical Support and reference this KB.

Step 2: Clear the erroneous battery test result by running the following commands:

# isi services isi_hwmon disable

# mv /var/log/nvram.xml /var/log/nvram.xml.old

Step 3: Clear the battery test alert and unset the node read-only state so the upgrade can proceed:

– Check ‘isi event events list’ to get the event ID for the HW_INFINITY_BATTERY_BACKUP_FAULT event. Then run the following commands:

# isi event modify <eventid> –resolved true

# /usr/bin/isi_hwtools/isi_read_only –unset=system-status-not-good

Step 4: Upgrade OneFS to 8.1.0.4 or later

Instructions for upgrading OneFS can be found in the OneFS Release Notes on the support.emc.com web site.

Step 5: Update node firmware using Node Firmware Package 10.1.6 or later

Instructions for upgrading node firmware can be found in the Node Firmware Package Release Notes on the support.emc.com web site.

Once the system is upgraded, no further spurious battery replacement alerts should occur.

If an OneFS upgrade to 8.1.0.4 or newer is not an option at this time, or if the system generates further battery failure alerts after upgrading, please contact EMC Isilon Technical Support for assistance, and reference this KB.

Related:

VxRail: PTAgent upgrade failure, ESXi error “Can not delete non-empty group: dellptagent”[3]

Article Number: 516314 Article Version: 6 Article Type: Break Fix



VxRail 460 and 470 Nodes,VxRail E Series Nodes,VxRail P Series Nodes,VxRail S Series Nodes,VxRail V Series Nodes,VxRail Software 4.0,VxRail Software 4.5

VxRail upgrade process fails when upgrading PTAgent from older version 1.4 (and below) to newer 1.6 (and above).

Error message

[LiveInstallationError]

Error in running [‘/etc/init.d/DellPTAgent’, ‘start’, ‘upgrade’]:

Return code: 1

Output: ERROR: ld.so: object ‘/lib/libMallocArenaFix.so’ from LD_PRELOAD cannot be preloaded: ignored.

ERROR: ld.so: object ‘/lib/libMallocArenaFix.so’ from LD_PRELOAD cannot be preloaded: ignored.

ERROR: ld.so: object ‘/lib/libMallocArenaFix.so’ from LD_PRELOAD cannot be preloaded: ignored.

Errors:

Can not delete non-empty group: dellptagent

It is not safe to continue. Please reboot the host immediately to discard the unfinished update.

Please refer to the log file for more details.

Dell ptAgent upgrade failed on target: <hostname> failed due to Bad script return code:1

PTAgent can’t be removed without ESXi asking for a reboot, due to earlier version of PTAgent (lower than 1.6) had problem dealing with process signals, ESXi is unable to stop it no matter what signal is sent or what method is attempted to kill the process. Rebooting ESXi si required to kill the defunct process so the upgrade can proceed.

PTAgent 1.6 (and above) had this issue fixed, but upgrading from 1.4 to 1.6 can’t be done without human intervene once the issue is encountered.


Impacted VxRail versions (Dell platform only):

  • 4.0.x: VxRail 4.0.310 and below
  • 4.5.x: VxRail 4.5.101 and below

This issue is fixed in recent VxRail releases, but upgrade from earlier VxRail releases are greatly impacted. It’s strongly suggested customer to contact Dell EMC Technical Support to upgrade to PTAgent 1.7-4 which is included in below VxRail releases:

  • VxRail 4.0.500 for customer who stays on vSphere 6.0
  • VxRail 4.5.211 or above for customers who choose vSphere 6.5

Manual workaround if experiencing the PTAgent upgrade failure

  • Enter maintenance mode and reboot the host mentioned in error message
  • Wait until the host is available and showing proper state in vCenter, click retry button in VxRail Manager to retry upgrade.

Related:

ECS: One node will not power on in an ECS Gen1 or Gen2 system.

Article Number: 504631 Article Version: 3 Article Type: Break Fix



ECS Appliance,ECS Appliance Hardware,Elastic Cloud Storage

This KB article addresses when only one node will not power-on in an ECS Gen1 or Gen2 system.

One node will not power on in an ECS Gen1 or Gen2 system.

Bad blade server or bad chassis.

N/A

For ECS Gen1 and Gen2 systems, there are redundant Power Supply Units (PSUs) which supply power to a chassis and up to 4 blade servers in the chassis.

Based on this, if 1 node out of 4 will not power on, the issue can’t be the PSUs because the other nodes in the same chassis are powered on.

The issue has to be the blade server or the chassis itself.

Using an example where node 4 will not power on, one can swap the blade server from the node 3 position to the node 4 position and vice versa.

If the issue stays with the slot where node 4 resides, the issue is the chassis. If the issue follows the blade server, then the blade server is at issue.

Note: This sort of troubleshooting can only be done at install time before the OS and ECS software is loaded on the system.

Related:

Nutanix AFS (Nutanix Files) might not function properly with the ELM

This information is very preliminary and has not been rigorously tested.

AFS appears to use DFS namespace redirection to point you to individual nodes in the AFS cluster where your data is actually held. The ELM does not support DFS redirection, so when the STATUS_PATH_NOT_COVERED comes back from the initial node we reached, we fail the attempt instead of moving to the requested server. If randomly you happen to connect to the node where your data is, there is no redirection and no error.

Unfortunately, there does not appear to be a workaround except to point the ELM to a specific node in the AFS cluster instead of the main cluster address. This node probably has to be the AFS “leader” node.

Related:

Storage Node Network connectivity to Datadomain best practices

I am looking for some advise on the best practices on connecting networker storage nodes in a environment where clients are having backup IP’s in several different VLAN’s . So basically our storage nodes will contact NDMP clients over their backup networker in layer-2 on diff vlans and need send the backup data to data domain on separate vlan.

To depict this here is how we are currently backing up

NDMPClient1-Backup-vlan1———->Storage Node-Backup-Vlan1( Vlan5)———->DataDomain over Vlan5

NDMPClient2-Backup-vlan2———->Storage Node-Backup-Vlan2( Vlan5)———->DataDomain over Vlan5

NDMPClient3-Backup-vlan3 ———->Storage Node-Backup-Vlan3( Vlan5)———->DataDomain over Vlan5

NDMPClient4-Backup-vlan4 ———->Storage Node-Backup-Vlan4( Vlan5)———->DataDomain over Vlan5

So for every NDMP client backup vlan we defined and interface on storage nodes in the same Vlan.

And from Storage node to Datadomain connectivity we have a seperate backup vlan in layer-2

Since this is a 3 way NDMP backp , the traffic flows from clients to Storage nodes in one network and from storage nodes to Dataomdin in a different paths.

is this is a good model or do we have any other model that we can adopt to have better backup/restore performances.

Thanks in advance

Related:

How Microsoft Service Witness Protocol Works in OneFS

The Service Witness Protocol (SWP) remote procedure call (RPC)-based protocol. In a highly available cluster environment, the Service Witness Protocol (SWP) is used to monitor the resource states like servers and NICs, and proactively notify registered clients once the monitored resource states changed.

This blog will talk about how SWP is implemented on OneFS.

In OneFS, SWP is used to notify SMB clients when a node is down/rebooted or NICs are unavailable. So the Witness server in OneFS need to monitor the states of nodes/NICs and the assignment of IP addresses to the interfaces of each pool. These information is provided by SmartConnect/FlexNet and OneFS Group Management Protocol (GMP).

The OneFS GMP is used to create and maintain a group of synchronized nodes. GMP distributes a variety of state information about nodes and drives, from identifiers to usage statistics. So that Witness service can get the states of nodes from the notification of GMP.

As for the information of IP addresses in each pool, SmartConnect/Flexnet provides the following information to support SWP protocol in OneFS:

  1. Locate Flexnet IP Pool given a pool member’s IP Address. Witness server can be aware of the IP pool it belongs to and get the other pool members’ info through a given IP address.
  2. Get SmartConnect Zone name and alias names through a Flexnet IP pool obtained in last step.
  3. Witness can subscribe to changes to the Flexnet IP Pool when the following changes occur:
    • Witness will be notified when an IP address is added to an active pool member or removed from a pool member.
    • Witness will be notified when a NIC goes from DOWN to UP or goes from UP to Down. So that the Witness will know whether an interface is available.
    • Witness will be notified when an IP address is moved from one interface to another.
    • Witness will be notified when an IP address will be removed from the pool or will be moved from one interface to another initiated by an admin or a re-balance process.

The figure below shows the process of Witness selection and after failover occurs.

Drawing1.jpg

  1. SMB CA supported client connect to a OneFS cluster SMB CA share through the SmartConnect FQDN in Node 1.
  2. The client find the CA is enabled, start the Witness register process by sending a GetInterfaceList request to Node 1.
  3. Node 1 returns a list of available Witness interface IP addresses to which the client can connect.
  4. The client select anyone interface IP address from the list (in this example is Node 2 which is selected as the Witness server). Then the client will send a RegisterEx request to Node 2, but this request will failed as OneFS does not this operation. RegisterEx is a new operation introduced in SWP version 2. OneFS only support SWP version 1.
  5. The client send a Register request to node 2 to register for resource state change notification of NetName and IPAddress (In this example, the NetName is the SmartConnect FQDN and IPAddress is the IP of Node 1)
  6. The Witness server (Node 2) process the request and returns a context handle that identifies the client on the server.
  7. The client sends an AsyncNotify request to Node 2 to receive asynchronous notification of the cluster nodes/nodes interfaces states changes.
  8. Assume Node 1 does down unexpectedly. Now, the Witness server Node 2 is aware of the Node 1 broken and sends an AsyncNotify response to notify the client about the server states is down.
  9. The SMB CA feature forces the client to reconnect to OneFS cluster using the SmartConnect FQDN. In this example, the SMB CA successfully failover to Node 3.
  10. The client sends a context handle in an UnRegister request to unregister for notifications from Witness server Node 2.
  11. The Winess server processes the requests by removing the entry and no longer notifies the client about the resource state changes.
  12. Step 12-17. The client starts the register process similar to step 2-7.

Related: