How Microsoft Service Witness Protocol Works in OneFS

The Service Witness Protocol (SWP) remote procedure call (RPC)-based protocol. In a highly available cluster environment, the Service Witness Protocol (SWP) is used to monitor the resource states like servers and NICs, and proactively notify registered clients once the monitored resource states changed.

This blog will talk about how SWP is implemented on OneFS.

In OneFS, SWP is used to notify SMB clients when a node is down/rebooted or NICs are unavailable. So the Witness server in OneFS need to monitor the states of nodes/NICs and the assignment of IP addresses to the interfaces of each pool. These information is provided by SmartConnect/FlexNet and OneFS Group Management Protocol (GMP).

The OneFS GMP is used to create and maintain a group of synchronized nodes. GMP distributes a variety of state information about nodes and drives, from identifiers to usage statistics. So that Witness service can get the states of nodes from the notification of GMP.

As for the information of IP addresses in each pool, SmartConnect/Flexnet provides the following information to support SWP protocol in OneFS:

  1. Locate Flexnet IP Pool given a pool member’s IP Address. Witness server can be aware of the IP pool it belongs to and get the other pool members’ info through a given IP address.
  2. Get SmartConnect Zone name and alias names through a Flexnet IP pool obtained in last step.
  3. Witness can subscribe to changes to the Flexnet IP Pool when the following changes occur:
    • Witness will be notified when an IP address is added to an active pool member or removed from a pool member.
    • Witness will be notified when a NIC goes from DOWN to UP or goes from UP to Down. So that the Witness will know whether an interface is available.
    • Witness will be notified when an IP address is moved from one interface to another.
    • Witness will be notified when an IP address will be removed from the pool or will be moved from one interface to another initiated by an admin or a re-balance process.

The figure below shows the process of Witness selection and after failover occurs.

Drawing1.jpg

  1. SMB CA supported client connect to a OneFS cluster SMB CA share through the SmartConnect FQDN in Node 1.
  2. The client find the CA is enabled, start the Witness register process by sending a GetInterfaceList request to Node 1.
  3. Node 1 returns a list of available Witness interface IP addresses to which the client can connect.
  4. The client select anyone interface IP address from the list (in this example is Node 2 which is selected as the Witness server). Then the client will send a RegisterEx request to Node 2, but this request will failed as OneFS does not this operation. RegisterEx is a new operation introduced in SWP version 2. OneFS only support SWP version 1.
  5. The client send a Register request to node 2 to register for resource state change notification of NetName and IPAddress (In this example, the NetName is the SmartConnect FQDN and IPAddress is the IP of Node 1)
  6. The Witness server (Node 2) process the request and returns a context handle that identifies the client on the server.
  7. The client sends an AsyncNotify request to Node 2 to receive asynchronous notification of the cluster nodes/nodes interfaces states changes.
  8. Assume Node 1 does down unexpectedly. Now, the Witness server Node 2 is aware of the Node 1 broken and sends an AsyncNotify response to notify the client about the server states is down.
  9. The SMB CA feature forces the client to reconnect to OneFS cluster using the SmartConnect FQDN. In this example, the SMB CA successfully failover to Node 3.
  10. The client sends a context handle in an UnRegister request to unregister for notifications from Witness server Node 2.
  11. The Winess server processes the requests by removing the entry and no longer notifies the client about the resource state changes.
  12. Step 12-17. The client starts the register process similar to step 2-7.

Related:

NetScaler Double Hop Communication Flow with StoreFront

Logon Process

User-added image

First NetScaler Gateway packet flow ( Second NetScaler will not come into picture till the apps are enumerated)

1. The user starts his browser and connects (via hostname) to the external IP address of the NetScaler Gateway FQDN of the first hop. The NSG will authenticate and sends it to the StoreFront

2. The StoreFront in the second DMZ receives the request

3. StoreFront will validate the user based on his credentials

4. The StoreFront on the second DMZ sends the credentials to a server on the internal network hosting the XML service.

5. The XML Service authenticates the user and receives a list of published applications the user has access to. This list will be send back to the StoreFront.

6. The StoreFront will generates a page with the “published apps” and sends the page through the NetScaler in first DMZ back to the user

User-added image

Starting Process

User-added image

1. The user clicks his application and the request will be forwarded to the StoreFront

2. The SF again contacts the XML service to determine which XenApp server will handle the request. The XML service returns the IP number.

3. The SF then contacts the Secure Ticketing Authority (STA) to switch the IP address for a Session Ticket. The STA saves the IP address and sends a session ticket to the SF. (The XML and STA server don’t have to be the same server)

4. The SF generates an ICA file with the STA session ticket and the FQDN of the NSG in the first DMZ. This ICA file is send back to the user through the NSG in the first DMZ. As you see the application I clicked was Mozilla Firefox and the FQDN is of the first hop

User-added image
5. The plugin on the machine of the user reads the ICA file and initiates an ICA connection with the session ticket to the first hop NS in the first DMZ.

6. The NS in the first DMZ sends the Session Ticket through the NS in the second DMZ to the STA for validation. As you can notice below that the First NS sent the same ticket to the 10.104.23.83 which is the ip of the second hop NS and notice that the request has the Host header of 10.104.23.149 which is the STA server, Based on this host header the second hop will understand that I need to send the request to this STA server ( since second hop doesn’t have any STA configuration)

User-added image
7. The STA validates the ticket and sends the IP address of the XenApp server to the NS in the first DMZ.

You can see that packet 36455 is the same decrypted packet send by first NS is received on this NS and this Second NS made a request to the original STA server 10.104.23.149 in the next packet 36456

User-added image

In the Next packet on the same second hop you can notice that a response is received from the STA server that the Xenapp server is 10.104.23.149 on port 1494. And the same request is forwarded to the first hop NS in the next packet 36461 ( Remember in my lab both the Xenapp server and STA are same and that’s why we are seeing the same ip 10.104.23.149)

User-added image
8. The NS in the first DMZ establishes an ICA connection to the IP address of the XenApp server, These connection will be sent/Proxied to the Second Hop NS and the first hop NS will not try to make connection to xenapp directly. As we can see the first hop DMZ proxied all ICA connection to the second hop 10.104.23.83 and the second hop NS will forward the ICA traffic to the actual xenapp servers.

In the below trace taken on Second Hop you can notice in Green color that the traffic is coming to this hop 10.104.23.83 and this NS is actually making connection to the actual xenapp server 10.104.23.149 as shown in Pink color

User-added image
9. The XenApp server sends an acknowledgement back to the Second Hop NS ( acting as proxy) which will be sent to the first ho NS . Then the SSL/TLS handshake between the CAG in the first DMZ and the XenApp client will be completed. The ICA session is established and all traffic will flow

Related:

Provisioning fails with BadRequest… Invalid input for dns_nameservers. Reason: ‘1.2.3.4’ is not a valid nameserver. ‘

We see the error message in the Environment view of UrbanCode Deploy Blueprint Designer and the Heat logs
BadRequest: resources.network_setup.app_private_network_subnet: Invalid input for dns_nameservers. Reason: ‘1.2.3.4’ is not a valid nameserver. ‘
1.2.3.4’ is not a valid IP address. Neutron server return request_ids. [‘req-512c7a6a-894d-47b0-879a-41d1841ade31’]

The HOT file contains
dns_nameserver:
type: comma_delimited_list
label: DNS Name Server
description: The IP address of a DNS nameserver in list format
default: 1.2.3.4,5.6.7.8

Related:

The server does not receive the result of the task execution

I need a solution

Colleagues! Tell me how to find a way out of the situation when tasks on the client are running, and the server returns an error on timeout? There is an impression that there is no feedback towards the server. The client does not report a successful job.

<event date=”Jul 03 04:00:28 +00:00″ severity=”4″ hostName=”S58SRS” source=”Altiris.NS.StandardItems.Collection.NSDataSrcBasedResourceCollection.FullUpdatePostDataSrcProcessing” module=”AeXSVC.exe” process=”AeXSvc” pid=”1436″ thread=”317″ tickCount=”511697257″><![CDATA[Updating collection membership for collection ‘e50b60e7-a0a7-4e49-b338-b83896d6bb32’ with 0 members.]]></event>
<event date=”Jul 03 04:00:34 +00:00″ severity=”4″ hostName=”S58SRS” source=”Altiris.TaskManagement.ClientTask.*” module=”AtrsHost.exe” process=”AtrsHost” pid=”1636″ thread=”27″ tickCount=”511703840″><![CDATA[TaskExecutionEngine.ProcessPendingRequestList(): Queueing 1 items.]]></event>
<event date=”Jul 03 04:00:34 +00:00″ severity=”4″ hostName=”S58SRS” source=”Altiris.TaskManagement.ClientTask.*” module=”AtrsHost.exe” process=”AtrsHost” pid=”1636″ thread=”460″ tickCount=”511703840″><![CDATA[Resuming task instance Microsoft Outlook 2010 KB2965295 (7/3/2017 12:56:46 AM)(80fc0fdc-3668-42cc-b4dd-21030cda708b)]]></event>
<event date=”Jul 03 04:00:34 +00:00″ severity=”4″ hostName=”S58SRS” source=”Altiris.TaskManagement.ClientTask.BaseClientTask.CheckIsTaskComplete” module=”AtrsHost.exe” process=”AtrsHost” pid=”1636″ thread=”460″ tickCount=”511703855″><![CDATA[BaseClientTask.CheckIsTaskComplete(): Task “Microsoft Outlook 2010 KB2965295” (7/3/2017 12:56:46 AM) – 0 / 1 child instances done.  Timeout at 7/3/2017 12:11:46 PM.  Will complete at 7/3/2017 8:26:46 AM if 95% of child instances are complete.]]></event>
<event date=”Jul 03 04:00:34 +00:00″ severity=”4″ hostName=”S58SRS” source=”Altiris.TaskManagement.ServerTasks.ServerTaskExecutionInstance.Sleep” module=”AtrsHost.exe” process=”AtrsHost” pid=”1636″ thread=”804″ tickCount=”511703855″><![CDATA[Task instance Quick Delivery: SERVER01 (7/3/2017 12:56:46 AM) is being put to sleep until 7/3/2017 7:01:34 AM. Instance GUID: 80fc0fdc-3668-42cc-b4dd-21030cda708b]]></event>
<event date=”Jul 03 04:00:38 +00:00″ severity=”4″ hostName=”S58SRS” source=”Altiris.NS.StandardItems.Collection.PolicyChangeCollectionUpdateSchedule.OnSchedule_Impl” module=”AeXSVC.exe” process=”AeXSvc” pid=”1436″ thread=”296″ tickCount=”511707802″><![CDATA[Policy Update Schedule was last run on 7/3/2017 6:55:37 AM]]></event>

0

1499163965

Related:

Unable to backup any servers at one site

If this is happening at only one site, the issue is probably specific to that site (e.g. firewall).

The first thing I would check is whether or not these clients can connect to the Data Domain. There is a utility called ddpconnchk that you can run on the client system to verify connectivity to the DD. KB334991 describes how to use ddpconnchk. Note that while the article talks about the “media server”, you should run this on the system where the Avamar client is installed since the clients connect directly to the DD using DDBoost.

If all the connectivity checks succeed, the issue may be certificate related. For example, clients on the other side of a NAT may be rejecting the connection because the NAT IP addresses are not on the DD certificate’s list of Subject Alternate Names. If that is the case, I’d recommend working with support.

Related:

Process to turn a Standard Client into a Dark Network Client

I need a solution

Hi there,
Trying to find a way how to turn a SEP14 Standard Client into a SEP14 Dark Network Client. So far, BCS Support have said so far, that uninstall / reinstall is the only way. Don’t think that this is a very feasible way in server environments.
In case there is really now other way to do so, this should be considered in the next SEP14 versions.
Input / feedback is appreciated.
Cheers,
Michael

0

Related:

IP-based AUTD failed to initialize because the processing of notifications could not be setup. Error code [0x]. Verify that no other applications are currently bound to UDP port [], or try specifying a different port number.

Details
Product: Exchange
Event ID: 3015
Source: Server ActiveSync
Version: 6.5.7638.0
Component: Microsoft Exchange ActiveSync
Message: IP-based AUTD failed to initialize because the processing of notifications could not be setup. Error code [0x<number>]. Verify that no other applications are currently bound to UDP port [<number>], or try specifying a different port number.
   
Explanation

This event indicates that more than one application is attempting to use the User Datagram Protocol (UDP) listen port.

   
User Action

To resolve this error, do one or more of the following:

Important  This article contains information about editing the registry. Before you edit the registry, make sure you understand how to restore the registry if a problem occurs. For information about how to restore the registry, view the “Restore the Registry” Help topic in Regedit.exe or Regedt32.exe.

  • Verify that no other applications are currently bound to the UDP listen port on the computer. If any other applications are using the UDP listen port, specify a different UDP port. For information about how you can determine the applications using the UDP listen port specified in the event description, see Microsoft Knowledge Base article 323352 “How To Determine Which Program Uses or Blocks Specific Transmission Control Protocol Ports in Windows Server 2003” (http://go.microsoft.com/fwlink/?linkid=3052&kbid=323352).
  • Edit the registry and specify a different UDP port number to resolve this error:
  1. On the computer running Exchange Server, start Regedit.exe.
  2. Open the following registry key:

    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MasSync\Parameters

    The registry value name for the UDP listen port is UdpListenPort.

  3. Do one of the following:

    • To return to the default configuration of port 2883, delete the parameter UdpListenPort.
    • To specify a different value for the UDP listen port, right-click UdpListenPort, and then click Modify. In Edit DWORD Value, in Value data, type a value between 1 and 65535.
  4. Close Registry Editor.
  5. Restart Internet Information Services (IIS).

Before you edit the registry, and for information about how to edit the registry, see Microsoft Knowledge Base article 256986, “Description of the Microsoft Windows Registry” (http://go.microsoft.com/fwlink/?linkid=3052&kbid=256986).

Related:

Some client computers have not contacted the server in the last %1 days. %2 have been detected so far.

Details
Product: .NET Framework
Event ID: 13031
Source: Windows Server Update Services
Version: 2.0.50727
Symbolic Name: HealthClientsSilentYellow
Message: Some client computers have not contacted the server in the last %1 days. %2 have been detected so far.
   
Explanation
Clients should check in with the server on a regular basis for updates.
   
User Action
Clients Not Reporting

Client computers are not reporting status to the WSUS server.

Possible resolutions include:

  • Review the application event log and resolve any issues related to the IIS, SQL, and WSUS server.
  • Check connectivity from the client computer to the WSUS server and debug any issue found.
  1. Open a command window.
  2. Verify the client computer has a valid IP address: type ipconfig /all
  3. Verify the client computer can reach the WSUS server: type ping &lt:server name or IP address>
  4. Verify the client computer can reach the WSUS HTTP server: type http://<servername>/selfupdate/iuident.cab. This will return with the option to download the cab file.
  5. Verify that the Automatic Update (AU) client is running: type net start wuauserv
  6. Verify the AU client is configured properly: type Reg query HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate
  7. If the reg query returns an error, the AU Group Policy has not been sent to this client computer or the client computer has not been configured for a non-domain environment. This has to be corrected before the next step. See http://go.microsoft.com/fwlink/?LinkID=41777.
  8. Verify WUServer and WUStatusServer are pointing to the WSUS server and port number (for example, http://<wsusservername or IP address>/<port number>)
  • Reset the Automatic Updates client by stopping the Automatic Updates client service and forcing a reset.
    1. Open a command window.
    2. Type Reg query HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate
    3. Verify WUServer and WUStatusServer are pointing to the WSUS server and port number (for example, http://<wsusservername or IP address>/<port number>)
    4. Type gpupdate /force (if client machine is configured via domain policy).
    5. Type wuauclt.exe /resetauthorization /detectnow
    6. Wait 10 minutes for a detection cycle to finish before verification.
    7. Open the file <windir>\SoftwareDistribution\ReportingEvents.log in a text editor.
    8. Check the latest entry in the log file for “Success Software Synchronization Agent has finished detecting items.”
  • Verify

    Verify client computer and server status.

    • Check the server.
      1. Open a command window.
      2. Type cd <WSUSInstallDir>\Tools
      3. Type wsusutil checkhealth
      4. Type eventvwr
      5. Review the Application log for the most recent events from
        source Windows Server Update Services and event id 10010.
    • Go to the client computer and do the following:
      1. Open the <windir>\SoftwareDistribution\ReportingEvents.log file in a text editor.
      2. Check the latest entry in the log file for “Success Software Synchronization. The agent has finished detecting items.”

    Related: