Frequently Asked Questions During Diagnosis and Troubleshooting of NetScaler MAS Issues
The following section lists some of the frequently asked questions during diagnosis and troubleshooting of NetScaler MAS issues:
A: On a live device check the version.conf file under /mps:
bash-2.05b# cat version.conf
In support file you can open /var/nslog/dmesg.boot and grep for flash. You should see the build version.
bash-2.05b# grep flash dmesg.boot
/mas-11.1-52.15 -> /flash/mas-11.1-52.15
A: MAS does Nitro calls to instances and gets all the details.
To confirm how many vservers each instance has and communicated by NetScaler can be seen in mps_inventory.log:
Wednesday, 22 Mar 17 19:08:02.287 +0530 [Debug] [Emon[#60]] HTTP Request Protocol: https, ContentType: , Method: GET, URL: https://10.107.100.131/nitro/v1/config/lbvserver?attrs=name,ipv46,port,servicetype,effectivestate,curstate,health,tickssincelaststatechange,comment,lbmethod,persistencetype,totalservices,activeservices&pagesize=10000&pageno=1&format=json
Wednesday, 22 Mar 17 19:08:02.371 +0530 [Debug] [Emon[#60]] EMON_LB_VIP: 10.107.100.131, db_objects: 9, new_objects: 8
In the above output you will see differeance in db_objects and new_objects as a vserver was deleted from NetScaler when this call was last made. For performance related dashboard and graphs output look at /var/mps/mps_perf.log
A: Use the following command to capture tcpdumps on MAS:
Tcpdump –i 1 <options>
The switch -i specifies the interface and 1 represents the first interface on the MAS appliance. For further reference please review https://www.freebsd.org/cgi/man.cgi?tcpdump(1)
tcpdump -i 1 udp and src 10.107.100.131
tcpdump -i 1 tcp
tcpdump -i 1 src NSIP or dst NSIP
A: NetScaler MAS periodically communicates through Nitro calls for each and every instance and resource added to it. You can see these calls in mps_inventory.log and mps_config.log
A: Complete the following steps to upgrade different instances through NetScaler MAS:
- Go to Infrastructure > Configuration Jobs > Maintenance tasks.
- Now select the device that you want to upgrade.
A: Use the following command to run the techsupport script:
A: Complete the following steps to capture debug logs:
- Whenever you are capturing a support file to troubleshoot an issue, enable ‘Collect Debug logs’ in the tech-support page:
- When you enable this, additional debug level afdecoder logs are captured. This will help troubleshoot the issue further.
- The time that you will specify is the amount of time these additional debug logs will be captured after clicking on OK.
- So if you are troubleshooting HDX insight issue, give a suitable time and click on OK which will start the debugging:
- Now replicate the issue within that time interval so that debug logs pertaining to the issue are captured. Support file will automatically be generated after this timer expires.
This is available only from 12.0 onwards.
- Go to System > Diagnostics > Troubleshooting.
- You can then select the instance you want to diagnose and get the Diagnostics chart.
- Remember, only the vservers and instances where AppFlow is enabled will work with this.
- It will show you different AppFlow related parameters enabled on the device and different vserver states.
Q: What are the different processes on NetScaler MAS and how to troubleshoot different issues using mps_*.log in /var/mps/log?
A: The following are the different processes on NetScaler MAS:
- Control SubSystem (mps_control.log) : Initialize+monitor+stop other subsystems and the database. It is responsible for restarting any subsystem if it crashes.
- Service SubSystem (mps_service.log) : It has an inbuilt HTTP(s) Request/Response handler. It listens on port 80 and 443. Any request from UI/API will hit the Service SubSystem. Based on the type of request, it might process the request itself or pass it on to the other appropriate SubSystem. Response always goes back via Service SubSystem.
- Inventory SubSystem (mps_inventory.log) : It does inventory from NetScaler/SD-WAN instances and updates instances’ information in the database. It retrieves build/system information from NetScaler Instances. It runs complete inventory every 30 minutes by default. This subsystem also retrieves statistics from instances to show CPU/Memory usage etc.
- Config SubSystem (mps_config.log) : It processes any configuration request that is received from Service Process. Configuration request can be adding instance, or any other operations on NetScaler instane or on MAS itself. It is also related to admin user management, device profiles, external authentication server config etc.
- Event SubSystem (mps_event.log) : It raises internal events in case of any SubSystem failure or configuration changes. This subsystem also registers itself with NetScaler/SD-WAN instances and SDX appliances to receive syslogs and provide event based reporting. All traps and syslogs events come to this process.
- Perf SubSystem (mps_perf.log) : This subsystem is responsible for performance reporting of NetScaler/SD-WAN instances. This retrieves instance stats every 5 mins and aggregates them on minutely, hourly, daily and weekly basis. There are pre-defined reports.
- afdecoder SubSystem (mps_afdecoder.log) : This subsystem is responsible to receive AppFlow traffic from NetScaler/SD-WAN instances and process that data.
- afanalytics SubSystem (mps_afanalytics.log) : This subsystem is responsible for analytics reporting of NetScaler/SD-WAN instances. This subsystem aggregates the data on minutely, hourly, daily and weekly basis. There are pre-defined reports.
A: All the config is stored in a csv files under /var/mps/mpsdb directory.You can open each file and check the config.
A: Here are the different values that the s, stat and state output specifiers (header “STAT” or “S”) will display to describe the state of a process:
- D uninterruptible sleep (usually IO)
- R running or runnable (on run queue)
- S interruptible sleep (waiting for an event to complete)
- T stopped, either by a job control signal or because it is being traced.
- W paging (not valid since the 2.6.xx kernel)
- X dead (should never be seen)
- Z defunct (“zombie”) process, terminated but not reaped by its parent.
For BSD formats and when the stat keyword is used, additional characters may be displayed:
- < high-priority (not nice to other users)
- N low-priority (nice to other users)
- L has pages locked into memory (for real-time and custom IO)
- s is a session leader
- l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
- + is in the foreground process group.
Common NetScaler MAS Issues and Troubleshooting Steps
The following section lists some of the most common NetScaler MAS issues and the steps to troubleshoot these issue:
- Check the licenses for MAS and the virtual servers it supports. You can see that under Infrastructure > Licenses > System Licenses.
- Remember the license limit that you see in the above snapshot comprises of all the vservers on the instances including Gateway VIP and GSLB VIP.
- Click Modify Licensed Virtual Servers and see if you can find the missing Virtual Server by clicking Add Virtual Servers.
- See if you find the missing virtual server there.
- You can then go to the above snapshot page again and select the virtual servers you do not need. Then click Mark Unlicensed. You can then replace it with the virtual server that you need.
- Alternately, you can go to /var/log/ns.log and try to search for these errors:
Mar 21 18:22:34 <local0.err> NetScaler MAS-1 mas_event: 10.107.143.118 03/21/2017:12:52:34 GMT : EVENT VIPLICENSELIMITWARNING : 127.0.0.1:VIPLicenses – System discovered more Virtual Servers than license limit, dropping 1 vips
- You can take a snapshot of the error that you see on GUI when adding an instance.
- The following is the flow:
- In the collector file you can confirm this by looking at /var/ns.log file.
- To understand where the addition failed, examine the /var/mps/mps_Inventory.log and /var/mps/mps_config.log files.
- You can also take tcpdump on MAS by using the following command:
tcpdump -i 1 src NSIP or dst NSIP where NSIP = your NetScaler IP
MAS will only know about the vserver being down when it polls that instance.
Every 30 minutes, entities are polled by NetScaler MAS by using NITRO calls. An entity is either a policy, virtual server, service, or action attached to a NetScaler instance. While this poling interval is configurable, you cannot set it to less than 10 minutes. To configure it, navigate to Networks > Network Functions > Settings > Configure Polling Interval for Entities. You can also poll the entities configuration when required by navigating to Networks > Network Functions > Load Balancing > Entities Configuration > Poll Now.
- But you will see the status change in the Application dashboard of MAS. Because it monitors the applications in real time.
- You will also receive traps for this event and you can see that under Infrastructure > Events > Event Messages.
- You can see the output of the MAS poll in mps_config.log and mps_inventory.log files.
The troubleshooting procedure is similar to how you troubleshoot Insight.
- Verify if AppFlow is configured on NetScaler vservers. Check if MAS IP is in AppFlow collector list.
- Make sure that appropriate AppFlow policies and actions are in place.
- Examine mps_afdecoder.log and mps_afanalytics.log under /var/mps/log for any errors.
- Verify if traffic is reaching MAS from NetScaler on UDP port 4739. Use tcpdump on MAS to determine this.
Verify if those events are captured in NetScaler MAS event logs. We cannot see them in CLI logs as they are directly recorded to the DB. If yes, then verify if connection to mail server is taking place. You might see errors similar to the following in mps_event.log file:
Monday, 23 Jan 17 13:42:03.781 +0000 [Error] [Main] Exception : Host not found: mail.citrite.net occurred while sending mail for rule: email_rule_test
A successful email sending process will be like this:
Thursday, 30 Mar 17 12:45:54.728 +0530 [Debug] [Main] Establising session
Thursday, 30 Mar 17 12:45:55.003 +0530 [Debug] [Main] trying login
Thursday, 30 Mar 17 12:45:55.553 +0530 [Debug] [Main] sending message
Thursday, 30 Mar 17 12:45:57.411 +0530 [Debug] [Main] closing session
Thursday, 30 Mar 17 12:45:57.686 +0530 [Debug] [Main] Updating the database
Over here, you can see NetScaler MAS setting up a session with the email server.
- NetScaler uses SNMP to send traps to MAS. Confirm if the NetScaler is configured with right SNMP settings (This will already be configured when NetScaler is added to MAS).
- See if traps are sent to MAS using TCPDUMP on port 161,162.
- Examine mps_event.log under /var/mps/log for the following logs:
Friday, 31 Mar 17 11:41:06.948 +0530 [Debug] [Main] SNMP traps recieved = 6410, rate/sec: 0, Dropped packets after recieving = 0
Friday, 31 Mar 17 11:41:06.949 +0530 [Debug] [Main] SNMP traps processed = 6048, rate/sec: 0, Dropped Packets after processing = 0
This log will be shown every 30 seconds. Confirm if the SNMP traps received counter is increasing and dropped packets counter is not increasing. Ideally, the difference between processed and received SNMP counters should stay the same.
The reasons for traps being dropped are:
- Traps coming from sources that are not added on NetScaler MAS.
- NetScaler MAS is not able to process all of the packets sent to it.
- Some parsing related errors (This is captured in mps_event.log)
- NetScaler MAS is not able to understand that trap.
- Verify if logs are sent to NetScaler MAS using TCPDUMP on port 514.
- Examine mps_event.log under /var/mps/log for the following logs:
Friday, 31 Mar 17 13:04:35.578 +0530 [Debug] [Main] *********SyslogPacketHandler::packetprocessor: Syslog Packets enqueued so far: 2222000 *********
Friday, 31 Mar 17 13:08:35.060 +0530 [Debug] [Main] *********SyslogDecoder::dataConsumer: Size of queue before insert: 5*********
*********SyslogDecoder::dataConsumer: Syslogs inserted to DB: 2222038 *********
- The SyslogPacketHandler is generated every time logs are incremented by 1000. After this log is generated another log SyslogDecoder is generated after approximately 4 minutes.
- If it takes more than 4 minutes then chances are that syslog processor did not parse that data. Maybe logs are coming to the device which is not configured as managed device on NetScaler MAS.
- If the syslog packets are received and this counter is not increasing then the NetScaler MAS is not able to parse this data.
- Verify if there are any errors that are related to syslog.
- Before 11.1.52.x, only users who are part of the OWNER group can see all the data on NetScaler MAS. NSROOT is a part of this OWNER group and externally authenticated users could not be part of this group.
- If you are above 52.x then see if this user is part of a group which is having an ADMIN role.
- Also check if the group which a user is part of has appropriate privileges to see the data.
- Group and authentication info can be seen using:
- Verify if communication between NetScaler and MAS is open
- NetScaler is able to reach MAS on UDP Port 4739
- If LogStream is being used, then NetScaler should be able to reach NetScaler MAS on TCP Port 5557
- Verify if accurate AppFlow policy is bound to Vserver.
- Verify if the traffic going through monitored Vserver.
- Many times, AppFlow policy is bound to vserver where there is no data traffic.
- Verify policy hits. If there are multiple binding of same policy, then hits could correspond to a different feature.
- Verify the current disk space, RAM and CPU utilization.
- Refer to https://docs.citrix.com/en-us/netscaler-mas/11-1/analytics-how-to-articles/how-to-enable-analytics-on-instances.html for configuration help in enabling Insight on NetScaler MAS.
- Verify the version of XenApp/XenDesktop
- HDX Insight data will not be seen, if server version is XenApp/XenDesktop 7.8.
- Upgrade to later versions.
- Verify the client type
- If user is launching ICA sessions through Mobile Receivers, then those connections are not parsed, hence no data on MAS.
- Verify the Receiver version
- You might be using supported operating systems like Windows/Mac/Linux, but still you might not see HDX Insight data. This might be because you might be using unsupported version of Receiver.
- Go through following link, to find versions of supported Receiver – https://docs.citrix.com/en-us/netscaler-mas/11-1/Before-You-Begin.html
- Look for the following messages in /var/log/ns.log of NetScaler:
Mar 28 22:55:54 <local0.notice> 10.217.31.217 03/28/2017:22:55:54 GMT 0-PPE-1 : default ICA Message 99261 0 : “Session setup data send: Session GUID [0aa19f2a567d41c1aad01a87a7ec02d5], Client IP/Port [220.127.116.11/60106], Server IP/Port [10.160.56.47/2598], MSI Client Cookie [Non-MSI],Session setup time [03/28/2017:22:54:54 GMT], Client Type [0x0001], User [manohare], Client [18.104.22.168], Server [SJCPXA65HOF103], Ctx Flags [0x180022c52d], Track Flags [0x90d0db7c], Skip Code ”
If you are seeing above log message in NetScaler, having all the details accurate then NetScaler is working properly
- Ensure Skip Code is 0.
- Search for “ica_session_setup” message in mps_afdecoder.log of MAS in /var/mps/log. It should have the same Session GUID as logged in NetScaler. In above example, it is “0aa19f2a567d41c1aad01a87a7ec02d5”
- Verify if you have a Globally bound AppFlow policy with type ICA_REQ_OVERRIDE which may point to a different Collector IP. If you do it will override all other ICA policy bindings and also prevents ICA reporting to multiple Collectors. Use this command from CLI: show run | grep -i “appflow global”
- Detailed troubleshooting guide on HDX insight is found at – https://support.citrix.com/article/CTX215130.
- ICA RTT can be thought of as a measurement of the screen lag that a user experiences while interacting with an application hosted in a session on a XenApp or XenDesktop server. ICA RTT is different from Network RTT, which is the detected network latency between the ICA client device and the XenApp Server, while the ICA RTT includes an element of user interaction.
- Make sure that End User Experience Monitoring service is running on your client desktop/server desktop/VDA.
- For ICA round trip time calculations, in a Citrix Policy, enable the following settings:
- ICA > End User Monitoring > ICA Round Trip Calculation.
- ICA > End User Monitoring > ICA Round Trip Calculation Interval > set this to 5.
- ICA > End User Monitoring > ICA Round Trip Calculation for Idle Connection.
- There could be some parsing issue for that session and it can be validated by checking for following message: “System detected ICA Sessions do not support logging”.
- This issue is seen for unsupported XenApp/XenDesktop versions
- This issue is seen for unsupported version of Receiver or Receiver type
- While uncompressing ICA data, parser might have encountered an issue
- Issue while parsing ICA Handshake
- Unable to identify channel header
- Reports will not be available for those sessions
- There will be no problems with respect to connectivity
- There will be a core file generated under /var/mps/mps_images
- The file will be named as mas-11.1-x.xerror.dump
- Open this file and you will see the reason for the upgrade failure.
- The most common reason is low disk space. For this clear the disk space and upgrade again.
- Download Geo database files from http://dev.maxmind.com/geoip/legacy/geolite/
- Upload it to System > Advanced Settings > Geo Database files.
- If you want to add Private IP blocks you can do that by going to Infrastructure > Data Centers > Private IP Block.
- Also ensure to select Enable Geo Data collection for Web and HDX Insight during enabling Insight on NetScaler through MAS.
The following are the possible causes:
- NetScaler instances are not reachable from MAS server.
- Yellow (Out of Service):NITRO working / ping not working
- Red (Down): Both NITRO / ping not working
- NetScaler password is out of sync with instance password
- HTTP may be disabled in NetScaler but configured to use this in profile
- Examine the /var/mps/log/mps_inventory.log, if there are any NITRO/ping failure for such NetScaler Instances
- Verify network reachability from NetScaler MAS to NetScaler Instances. Use tcpdump on MAS.
- Verify NetScaler profile password and communication type
- First check If the configuration is as per Citrix Documentation – Analytics: Gateway Insight
- Points to Note:
- Gateway Insight supports both normal gateway and Unified Gateway.
- The NetScaler MAS release and build must be same or later than that of the NetScaler Gateway appliance.
- One hour of Gateway Insight reports can be viewed for NetScaler instances with Enterprise license. A Platinum license is needed to view Gateway Insight reports beyond one hour.
- Gateway Insight is different than HDX insight and successful user logons, latency, and application-level details for virtual ICA applications and desktops are visible only on the HDX Insight Users dashboard.
- In a double-hop mode, visibility into failures on the NetScaler Gateway appliance in the second DMZ is not available.
- Remote Desktop Protocol (RDP) desktop access issues are not reported.
- If still you don’t see anything on that page then make sure that NetScaler is sending Appflow or Logstream data to MAS in /var/mps/mps_afdecoder.log
- Verify if you have a Globally bound AppFlow policy with type REQ_OVERRIDE which may point to a different Collector IP. If you do it will override all other REQUEST type AppFlow policy bindings and also prevents Gateway Insight reporting to multiple Collectors. Use this command from CLI: show run | grep -i “appflow global”
1) How to troubleshoot HA issues on MAS:
Please check the following logs and see if you find any error:
/var/mps/log/deployment_type.py.log – Shows the deployment type and if the node is the first node in the HA pair or additional node.
/var/mps/log/mas_node_status_reporter.py.log – Shows registration details on joining node/node 2.
- /var/mps/log/mps_config.log – Shows the registration details on node 1.
To check why the upgrade on the other MAS node failed please check /var/mps/log/server_nodes_upgrade.py.log
The pgxl account is used for communication between the pair so checking the pgxl logs (in /var/mps/log/) may assist with HA related issues.
2) How to recover nsroot password for MAS:
If the customer has lost / forgotten the nsroot password this can be reset using the below steps.
- Access MAS through console
- Interrupt the boot sequence at the point were it says press Ctrl-C at the bootloader (shortly after message “Loading /boot/defaults/loader.conf”)
- At the prompt issue command boot -s
- After boot mount the flash using mount /dev/ad0s1a /flash
- Once mounted create new file using touch /flash/mpsconfig/.recover
- Login with default nsroot user password