I think it is apropos to begin by mentioning that this document is not meant to be exhaustive. I hope this can act as a handy cheatsheet for when that pesky Zscaler ticket lands in your pile – that is my intention here. This is, essentially, my learnings from troubleshooting Zscaler issues for over two years, documented.

This cheatsheet references the study material for Zscaler Certified Cloud Administrator - Private Access (ZCCA-PA) extensively and therefore follows the same structure the study materials do, more or less. Also, statutory warning, I suppose: I experimented with ChatGPT for this cheatsheet, using it for tweaks, rewrites, etc.

Problem Localization

When dealing with access issues on Zscaler, or any tech issues for that matter, begin by trying to pinpoint the location of the issue and identify its scope. Does the problem impact a single user or multiple users? Is it exclusive to road warriors (‘remote users’ in Zscaler lingo) or fixed-location users? Gathering detailed data from affected users through remote access or direct observation can help identify the scope of the issue. Once the affected user or machine is identified, it is time to focus on locating the precise problem. This could be a local issue, an uplink problem to the ZPA cloud, or an infrastructure component failure/misconfiguration, perhaps something with the SAML IdP or the Zscaler service. Utilizing available tools on the ZPA admin portal and network troubleshooting tools such as ping and traceroute can help narrow down the failure domain. During communication with the end user, asking specific questions about symptoms, connectivity, and authentication can help determine the nature of the issue. Nothing beats observing the problem firsthand, so arranging to access the affected device either physically or through remote collaboration tools like Teams. Zoom, etc. can aid in running tests and installing software like the Zscaler Analyzer tool.

Suggested Troubleshooting Flow

Well, the first thing you would want to check is whether the end user is even connected to ZPA. That is, verify that the Service Status on the Zscaler Client Connector on their machine is ON and, additionally, depending on how you have configured Zscaler for your organization, the Authentication Status is Authenticated. It sounds rather obvious, but trust me, this has been the resolution to quite a few tickets during my time as an engineer. You would want to cover all your bases before you get into the really technical bits, so ensure the hostname/URL specified by the user for the application is correct, the port range mentioned is correct, and the application is indeed up and running as expected.

Alt text

If the end user is not able to authenticate through the Zscaler Client Connector, your next course of action would depend on the error they are getting. For example, if the end user receives a ‘Sign in failed’ error (or a similar message) during the authentication process, it typically indicates an issue with the username or password. It is possible that they are not entering the correct password, or that the username is not recognized by the IdP. This error is usually caused by a simple mistake on the part of the end user, but it can also be related to provisioning or synchronization issues with the user’s account. This is where you might want to run some checks on their account: Is the user account still valid? Is the user able to reach the IdP configured for your environment? Is the IdP functioning correctly and capable of effectively processing user authentications?

When a user encounters a DNS error or a ‘404’ page during the authentication process, it usually means that the IdP login page is inaccessible. This could be caused by a faulty IdP configuration in the ZPA admin portal, but it is more likely due to an accessibility problem from the user’s network. For example, a DNS configuration issue may be preventing the resolution of the hostname or a firewall may be blocking access to it. In some cases, the end user may be able to reach the destination service but encounters a certificate error during the authentication process. This could be caused by a faulty certificate on the IdP, which would affect all users attempting to authenticate. However, it is also possible that the error is due to an intermediate system trying to perform SSL inspection on the Zscaler Client Connector traffic.

Ultimately, there are three main root causes for user authentication issues on Zscaler:

  1. Device/Network issues: A number of potential problems related to the end user device or the network it is connected to could potentially hinder successful authentication. Some possible causes to consider: an antivirus client or the firewall on the user’s machine may interfere with the authentication traffic, the end user may have entered a bad username or incorrect password, etc.
  2. IdP issues: It is possible that the IdP simply does not recognize the end user. This could be due to synchronization or update issues with the directory. You can attempt to resolve this issue by performing a manual sync of the directory, if applicable, and checking if the user is now correctly populated in the IdP. Additionally, the IdP must be configured to accept access requests from specific ‘Service Providers’ (or ‘Relying Parties’ in the Microsoft world) so verify that Zscaler Private Access is correctly set up in the IdP as a valid Application.
  3. ZPA portal issues: Expanding on the previous point, there could be issues with the IdP configuration on the ZPA admin portal, which might need a review. To troubleshoot potential authentication issues, you can utilize the ‘Import’ function on the IdP Configuration screen to verify that you can authenticate successfully with a valid user. Another potential problem to consider is an invalid certificate on the IdP, which may have expired. Validate the certificate during your test, and refresh it if necessary.

If the end user is able to authenticate successfully, or if authentication is not required but they still experience difficulties accessing a website or connecting to the Internet in general, it is worth investigating if any Zscaler policies are causing the issue. That is, verify that the user actually does have permissions to access the URL/IP they are trying to access. Alternatively, some policies may have been implemented to legitimately block access to sites that are deemed unsafe or inappropriate, as per your organization’s security policies. However, it is also possible that a policy has generated a ‘false positive’, in which case the policy would need to be reviewed and reconfigured appropriately.

In most cases, by reviewing the logs on the ZPA admin portal, you can gain valuable insight into what may be causing the user’s inability to access a particular application. This information can help you pinpoint the issue and determine your next course of action in troubleshooting. Click on Diagnostics on the portal and Add FiltersUsername, which is generally the end user’s email address, and Application: Domain, which is the URL/IP they are unable to access.

Alt text

If the user is being blocked by policy – that is, you see a SE: Application policy blocked access or similar status code in the logs – your next course of action should be determining whether this is a legitimate block. If it isn’t, correct the configuration by updating the Criteria section of the access policy.

Alt text

If, on the other hand, the user is supposed to be able to access the application as per the policies and is unable to do so, it is time shift your troubleshooting efforts to address potential misconfigurations. For example, a mismatch between the SAML attributes known to the ZPA system and those returned by the IdP could be one of the reasons for our troubles here. It is also possible that the IdP is misconfigured in terms of which attribute to map to the ‘claim’ returned to Zscaler during user authentication. To address this, carefully review the SAML attribute configuration and create or reconfigure attributes as needed. Additionally, check the Access Policy Rules and ensure that both the configuration and logic are correct, keeping in mind that rules are read from the top down with a first match algorithm. Finally, confirm that the attribute values returned by the user during authentication are accurate, so that policy rule logic can be properly applied.

One possible issue that can arise is that users may be requesting an application by hostname that is not matched to a valid fully qualified domain name (FQDN). In this case, it is important to review the DNS Search Domains configuration on the Administration > Application Management > Application Segments page in the ZPA admin portal to ensure that it is both correct and complete. This will help ensure that ZPA can correctly match the requested hostname to the correct application and allow users to access it successfully.

Zscaler also recommends checking for, what they call, “Infrastructure Issues” – which largely have to do with missing group memberships. Ensure the following:

  • The relevant Server is member of the correct Server Group
  • The App Connector adjacent to the application is in the correct App Connector Group
  • The Application is in the correct Application Segment
  • The applicable Server Group is selected in the Application Segment
  • The relevant App Connector Group is selected in the Server Group

The health and availability of the App Connectors adjacent to target private applications is critical for ZPA connectivity. To ensure proper functionality, several potential issues should be investigated, beginning with verifying that the Connector is indeed enabled in the ZPA admin portal.

Check that the App Connector host has a correctly configured IP address for its subnet, either through a properly configured static IP configuration or DHCP. Additionally, ensure that the DNS environment is functional, enabling successful resolution of both internal and external hosts. Verify that the App Connector has outbound access on port 443 and that no SSL inspection is being performed on outbound connections to the ZPA cloud infrastructure.

You might also want to verify that the App Connector VM is provisioned according to deployment guidelines as inadequate allocation of CPU or RAM may impact App Connector performance. Additionally, verify that the certificate deployed to the App Connector is correct and still valid.

Note that App Connector Provisioning Keys have a limited number of uses, so if you are trying to bring up a new App Connector and it won’t accept the key, check the Maximum # of App Connectors and the Provisioning Key Utilization Count on the Administration > App Connector Provisioning Keys > App Connector Provisioning Keys page of the ZPA admin portal.

Tools of the Trade

One of your first checks when an end user calls in with a problem should be the Zscaler Trust site, to check for known outages or issues. Also, ask the user to navigate to the Zscaler Proxy site from the device they are experiencing access issues and click Connection Quality to run a quick check. To ensure your ZPA settings are properly configured, refer to the ZPA configuration requirements site. This resource can help you identify any misconfigurations and ensure your ZPA deployment is set up correctly.

You can download the Zscaler Analyzer from the Proxy Test page. This tool is helpful in analyzing the path between your location and the Zscaler Public Service Edge that you are connected to, as well as measuring the time it takes for your browser to load a web page. By saving the results to a file, you can easily include them in a support ticket for further analysis, if needed.

The troubleshooting tools available on the More tab of the Zscaler Client Connector—

Alt text

—can be used to configure different log modes to control the type of information stored in the logs, or start a packet capture, or clear logs altogether. The Restart Service and Repair App are your best pals, as they were mine. When nothing helps, these two surely will.

Make use of the ipconfig (Windows) / ifconfig (Linux) command to review network adapter configurations. By using the options /all (Windows) / -a (Linux), you can obtain full details on the configuration. Ensure that the device has a valid IP address, a valid gateway, and a valid DNS server configuration. You can also use the ping command to confirm connectivity and verify that the gateway is working. It is recommended to also ping by FQDN to confirm DNS resolution, and if necessary, ping by IP address for the public Google DNS service (that is, ping 8.8.8.8). Check the round trip time for the pings to identify the end-to-end latency on the connection.

For identifying where in the route a problem may be occurring, tracert (Windows) / traceroute (Linux) can be used to show the full path of traffic to the destination address and the round-trip times for each hop. Trace the route to local destinations, such as the default gateway, to confirm local connectivity and to Internet destinations to confirm end-to-end connectivity.

To obtain domain name or IP address mapping or any specific DNS record, the nslookup command can be used to query the Domain Name System (DNS). You can use this utility to forward or reverse resolve FQDNs to IP addresses or public IP addresses to the matching FQDN.

If necessary, a protocol analyzer such as Wireshark can be used to capture packets on the wire as transactions occur. Packet captures can be saved to file for analysis of protocol flows or uploaded to a support ticket. Zscaler recommends using such tools as a last resort, as capturing traces can be a labor-intensive process. You must first identify where the captures are needed, get an analyzer in place, and may need to do simultaneous captures at multiple points on the network path.