I’ve recently been involved in MOSS 2007 farm topology discussions with a client that was interested in using the Split back-to-back topology. After a lengthy troubleshooting and escalation process we’ve identified some problems with this TechNet extranet farm topology guidance in conjunction with Microsoft Tier 2 support. In short, the TechNet document identifies some supported topologies that span domains, but this incident has raised questions about:
- The acceptable placement of server roles in those topologies.
- Supported domain trust directions.
- Alternate Access Mappings requirements.
- Picking people from other domains.
This is an account of the relevant issues and the steps that we took to reach our conclusions.
The proposed topology
In this scenario we had two web front ends and an index server inside the corporate network and two additional web front end servers in the DMZ. The DMZ servers were members of an external domain and an ISA firewall separated the networks. The DMZ web front end servers would respond to external requests while the two internal servers would respond to internal requests. The DMZ servers were configured with the same service and application pool configuration as the servers on the internal domain. The SharePoint web applications ran under internal domain accounts.
Before diving in, you might wonder why someone would want to use this topology. The primary benefits are that some physical farm resources are consolidated and that the external users cannot authenticate against the internal domain; there is a 1-way domain trust, where the external domain trusts the internal domain. Internal users would hit load balanced internal domain web front end servers when they are logged on to the internal network for improved performance (removing ISA as a bottleneck).
This topology was proposed based on the planning guidance linked above. While the topology diagrams only detail web front end servers in the DMZ, this paragraph suggests that additional web front end servers can reside in the internal domain:
You can place one or more Web servers inside the corporate network to serve internal requests. This results in splitting the Web servers between the perimeter network and the corporate network. If you do this, ensure that traffic from the Internet is load-balanced to the Web servers in the perimeter network and that traffic from inside the corporate network is independently load-balanced to the Web servers inside the corporate network. You must also set up different alternate access mapping zones and firewall publishing rules for each network segment.
This guidance is not very clear when it comes to the alternate access mapping zones requirement. In conversations with Microsoft technical support I was unable to get clarification on that guidance, although the problem that we encountered may hint at its meaning.
Picking people across domains
This deployment was escalated to me when we encountered some unexplained people picker behaviour. When we would browse to a web application on the web front end servers in the DMZ we were successfully able to search for users in both domains. When we would browse to a web application on the internal network’s web front end servers we were only able to pick people from the internal domain. Searching for known users in the external domain (where there were no internal matches) we would get a, “No exact match was found” or “No results were found to match your search item. Please enter a new term or less specific term.” In short, the internal domain servers couldn’t pick from the external domain on any web application.
Our troubleshooting with Microsoft started with a methodical review of the stsadm peoplepicker configuration, ports, permissions, etc. To abbreviate a very long second chapter of this story, the problem persisted throughout a number of different reconfigurations at the domain and forest level, with and without explicitly declared credentials for the external domain. We settled on a 1-way domain trust and configured the people picker with explicit credentials for the external domain.
As a brief aside, I should note that the Full Metal Architect blog is an excellent resource on this topic. Per the guidance in those articles and this Microsoft support blog, I started to troubleshoot with PSGetSID, at which point I got the first indication that this may not be a SharePoint issue per se. We noticed the same behaviour with PSGetSID as in SharePoint. We were unable to resolve the SIDs of external users from the internal domain servers. We also noticed that both SharePoint and PSGetSID started to work as soon as we briefly switched to a two-way domain trust (as a test).
However, I couldn’t reconcile these results with my network monitor captures, in which SharePoint was successfully retrieving precisely the same information in its LDAP query of the external domain from any server. On every server in the farm I could see that the people picker was successfully returning five of six partial attributes for the user that I hoped to pick. It could find cn, distinguishedName, displayName, userAccountControl and sAMAccountName. The objectSid was the one value that was blank, but we saw this same result on both internal domain servers and in the DMZ.
PartialAttribute: objectSid=( )
So we knew that:
- SharePoint was issuing successful LDAP queries of both domains from any server in the farm, but the results of the query from the untrusted domain never made it in to the people picker’s results on the internal domain’s servers.
- People picking worked on all servers as soon as we changed to a two-way trust.
At this point I was speaking with a technical lead on the 1st-line support team who suggested that this topology was unsupported. He supported this claim with two TechNet posts from Joel Oleson (in which he explicitly says all servers need to be in one domain) and Neil Hodgkinson (in which he also says “you cannot split a farm across multiple AD domains”). Typically I would take their guidance at face value but since this was directly contradicting the Microsoft guidance at the top of this article and since their posts were so old, I persisted for a more official response from Microsoft. The next morning I was told that those posts were wrong and that a farm can have servers in multiple domains.
Microsoft pushed the issue to a Tier 2 support technician, who quickly escalated to his escalation point. Finally getting somewhere… He quickly explained that we absolutely won’t be able to pick external users from the web applications on internal domain servers for the same reason that you can’t assign permissions to users from an untrusted domain. Although internal domain users can authenticate to the external domain resources they cannot grant users from the untrusted domain access to the internal domain’s resources. In the same way (since SharePoint uses SIDs to track all user activity), if a SharePoint web application is looking for a user in an untrusted domain, it won’t be able to get the SID in order to assign tasks or do anything else with a user, even though the initial LDAP query succeeds and even if the task has nothing to do with securing a resource. This was my fundamental stumbling block. Since we weren’t granting access to anything I couldn’t see why a trust would be required, but since SharePoint uses SIDs to track all user activity it makes sense.
So… our options were to put a two-way trust in place or to create an alternate access mapping zone that would point exclusively at the DMZ servers (although the support contact agreed that using a different zone for picking would result in pretty poor usability). In the end, we wound up simplifying the topology and used ISA publishing, since the two-way trust entailed similar security implications to the inversion of the 1-way trust for an ISA reverse proxy. I also asked if enabling Selective Authentication might help to lock down the trust of the external domain but the Microsoft support team were unable to get the people picker to work with Selective Authentication enabled. I believe that every user that might be picked would need to be enabled for Selective Authentication in this scenario, which would completely defeat the purpose of selectively authenticating.
Since then, in further discussions with Tier 2 support, we’ve been told that servers in different domains are not supported unless all web front end servers are in the external domain. They added that you can only have a web front end on the internal domain if it is an index server, e.g., not serving requests to users. I must emphasise that this was the response to a support case and not official Microsoft guidance, although I have commitments from Microsoft support that the TechNet documentation will be updated in the near future. We’ll need to wait to see how updates to that guidance pan out when they are released (this might take some time, as it involves multiple teams). For now, I would proceed with extreme caution if considering any topologies with servers in multiple domains.