A couple of months ago I was happily building a client’s SharePoint Server 2010 farm when I stumbled at Search. The Service Application provisioned fine, but when I pushed out topology changes I started to have problems. Later, these problems returned in different forms, but the root cause appears to have been consistent. In this post I will review the symptoms, the single fix and the reason why this issue emerged in this environment. I’ll also look at some unexpected permission changes that occur when new servers receive Search Service Instances.
My difficulties started when I attempted to move a newly-provisioned Query Component to a web front end server. When it failed, I tracked the problem down to missing permissions on C:WindowsTasks. At this point I didn’t know why the permissions had been removed and this was actually the first time I’d noted these permission requirements. TechNet suggests WSS_ADMIN_WPG needs Full Control of %WINDIR%Tasks, but the description of this requirement is “N/A”. Oddly, according to this TechNet article, the WSS_WPG group does not appear to need these same rights, although they are assigned by the SharePoint installation/configuration processes – or at least they are in the environments that I’ve built.
Adding to this confusion, I found this strange ULS event, in which the provisioning process tries to remove WSS_WPG access to %WINDIR%Tasks and grant R/W access to the Search service account. This is pretty weird! It might explain why the WSS_ADMIN_WPG group needs Full Control rather than just R/W access, but I wouldn’t typically expect SharePoint to be modifying ACLs in the Windows directory.
Back to the provisioning problem at hand, once I added the missing permissions for both the WSS_WPG and WSS_ADMIN_WPG local groups on %WINDIR%Tasks the provisioning process completed successfully. You can also see that the “Modifying ACL” event directly precedes the failure to start the new Service Instance. While this event helped me track down the problem, and is clearly related to it, unfortunately I need to leave that mystery behind for now, as there are bigger issues to address in this post.
Later, this client got back in touch and mentioned that their Search Service Application wasn’t working. In this case the Search Administration page was available but all Content Sources, Scopes, Crawl Logs, etc. pages failed with errors on the Admin Component.
Crawl status: The search service is not able to connect to the machine that hosts the administration component. Verify that the administration component <GUID> in search application ‘<Search Service Application name>’ is in a good state and try again.
To cut a long story short, my initial troubleshooting didn’t immediately lead me back to these missing permissions due to a number of other concurrent infrastructure changes which lead me astray. Additionally, when we tried to delete the Search Service Application to recreate it, the deletion failed after removing just one of the Search databases. Eventually we managed to re-provision the Service Application but the topology changes failed again, at which point we identified the missing %WINDIR%Tasks permissions (again) and granting the missing permissions fixed these problems (almost).
In fact, we also needed to grant missing permissions on Program FilesMicrosoft Office Servers14.0DataOffice Server, but I believe that was a one-off related to the failed Search Service Application deletion earlier. One way or the other it doesn’t appear to be a core issue here. However, I should also mention that I suspect the Search Service Application deletion failed because of the missing %WINDIR%Tasks permissions – although I’m basing this entirely on the fact that the ULS events above suggests that a similar process takes place for deletion, by virtue of the “(un)provisioning” job.
With Search back up and running, we moved on to other things, but eventually Search started acting up again. Unfortunately I’ve lost track of the visible failure, but the application logs were full of 6398 and 6482 errors (which typically indicate the unavailability of the service rather than the cause). I vaguely recall that we had items in the index but that new crawls were failing to run. At the time, I was most focused on Gatherer Access Denied messages on the Portal_Content Catalog.
Again, to abbreviate other misguided efforts related to on-going infrastructure work, we eventually found out that the permissions on %WINDIR%Tasks were missing. Obviously, at this point the most reasonable explanation for the change was a Group Policy setting, so we reviewed the event logs in between the last known good crawl and the first crawl failure. I quickly spotted a Group Policy change message. I recommended that we review the Resultant Set of Policy on this server, just to be absolutely certain the Group Policy wasn’t applying permission changes in this location. The client assured me this was very unlikely, because they don’t have an overly restrictive culture, but it turned out this was the one and only file system permission change and it was applied to the Default Domain Security Policy. Presumably the previous Search failures occurred after reboots or some other event that would re-apply this group policy. And presumably all of this strange behaviour can be accounted for by these missing permissions, given that we know they were getting removed and we know that adding them back in fixed the problem.
Later that night, curiosity got the better of me. I dug a bit deeper to see if I could identify anything that recommends these permission changes. I found Microsoft Support KB article KB962007, Virus alert about the Win32/Conficker worm. In this article, Microsoft recommends the following mitigation steps to prevent the virus from spreading:
Set the policy to remove write permissions to the %windir%Tasks folder. This prevents the Conficker malware from creating the Scheduled Tasks that can reinfect the system.To do this, follow these steps:
- In the same GPO that you created earlier, move to the following folder:
Computer ConfigurationWindows SettingsSecurity SettingsFile System
- Right-click File System, and then click Add File.
- In the Add a file or folder dialog box, browse to the %windir%Tasks folder. Make sure that Tasks is highlighted and listed in the Folder dialog box.
- Click OK.
- In the dialog box that opens, click to clear the check boxes for Full Control, Modify, and Write for both Administrators and System.
- Click OK.
- In the Add Object dialog box, click Replace existing permissions on all subkeys with inheritable permissions.
- Click OK.
In effect, this Group Policy removes the special Read/Write permissions assigned to Authenticated Users on the %WINDIR%Tasks folder by default. Note: it replaces all permissions with those defined in the Group Policy. I suppose the moral of this story is not to apply security settings like this to the Default Domain Security Policy. But fair play to my client for the security diligence in the first place.
This issue raises a couple of other questions. What is the best way to handle this for SharePoint servers, given that there are legitimate reasons harden this location? I suppose the best option would be to create another Group Policy for the SharePoint servers OU which will add the local WSS_WPG and WSS_ADMIN_WPG group permissions back on the %WINDIR%Tasks folder. There will be other options, depending on how your domain/Group Policies are structured, but this illustrates an approach. It would be helpful to understand if the Search account should be added as well, but for now I’m going on what the installer/configuration wizard does rather than what TechNet fails to describe fully.
Next question: why isn’t this issue more common, given that the virus first emerged over two years ago? I suppose the group policy might not have been taken up by many organisations, but it’s more likely that there are further wrinkles I’ve not uncovered. I tried to replicate the problem in my single server + DC development environment, but frustratingly, everything worked fine after applying this group policy. I rebooted and confirmed the permission changes, ran a full crawl, ran a query and reviewed event logs, but all seemed fine. I even re-provisioned my Search Service Application and that succeeded. To be perfectly honest I’m not sure what to make of this. Perhaps this is only an issue once the search topology takes a specific shape? That feels like the most likely explanation. I hope to do more testing on this in future, but for now I wanted to identify a fix that worked for me and which aligns with the settings applied by the SharePoint installer/configuration wizard, should this problem arise for others. I’m not the first person to discover this problem. I think it’s actually been around since MOSS 2007, based on some forum posts, but I haven’t seen it described in relation to this Conficker protection, which hopefully helps make the Group Policy modelling decisions a bit less obscure.
More broadly, I’d be really curious to hear if anyone has information about the mismatch between TechNet and SharePoint default permissions on %WINDIR%Tasks, and the further mismatch between the “Modify ACL” event, TechNet and the default settings. It may turn out that the WSS_WPG permissions are unnecessary or even undesirable, but given that SharePoint puts them there in the first place, I’m uncomfortable removing them until there’s better information to rely on.