Testing Manage Patch Status

In my last post I discussed how the Product Version Job timer job uses the Windows Installer Service to query the installed state of SharePoint 2010 servers and how the Manage Patch Status page in Central Administration displays this information. I also touched on my reservations about what we can infer from this data. In this post, I’m diving a bit deeper in to that question.

A quick word about the DCOM Permissions

In my last post, I put off a discussion of the security implications of granting the Farm account DCOM Local Activation rights to the Windows Installer Service (in order to clear the DCOM 10016 event log errors). I was worried about this approach, since this DCOM Component opens up the Windows Installer, which represents a different type of security risk than say… IIS WAMREG. Following my last post, Spencer Harbar suggested that these worries were unfounded, or rather, that the risks are acceptable, since it’s only a risk if the Farm account gets compromised. He rightly pointed out that you’d be pretty stuffed at that point anyway. Fair enough. To this end, I’ll join him in not worrying about it.

How to fix it
If you want to clear the DCOM 10016 errors by granting these rights, you need to assign ownership of HKCRAppId{000C101C-0000-0000-C000-000000000046} to Administrators, then grant Local Administrators Full Control. Now you’ll be able to grant the DCOM Local Activation rights to the Farm Account on this same {000C101C-0000-0000-C000-000000000046} component.

Despite carrying a lighter weight on my shoulders, I think it might be helpful to review what came out of my testing, as the job may not be detecting everything we’d expect at face value. I’ve also poked a few more holes in the Support response, which was the whole reason I started working on this in the first place.

Testing the Job

In these tests, I’m wilfully trying to do stuff you would never want to do in any farm – just to find out what the job “knows” about. To this end, I’ve tried some pretty foolhardy things like:

  • Manually updating DLLs in the GAC.
  • Manually updating DLLs in the Program Files directories.
  • Manually killing a Cumulative Update installation while it was half-way complete.
  • Deleting DLLs from the GAC and the Program Files directories.
  • Manually updating registry keys.

Are these the right tests? They certainly aren’t comprehensive. Suffice it to say I’m not the right person to comment on what the Windows Installer might be able to detect. In the process of researching this I’ve already become far more acquainted with Reflector and the Windows Installer than I ever hoped to be. I’ve even found out that there’s a Windows Installer blog and Windows Installer MVPs. Who knew? But are these changes the types of things that could cause disruption in a farm? Probably. And should we understand if the Manage Patch Status page in Central Admin accounts for problems like these? I think so. Thus, this imperfect testing by the wrong person.

Replacing DLLs

In the first two tests below, I copied DLLs out of an installed instance of the December Cumulative Update and replaced the installed June Cumulative Update versions of these DLLs in another machine with these newer copies. The DLLs I was looking at were for Microsoft Excel Services Components and Microsoft InfoPath Forms Services (this is how they are listed on the Manage Patch Status page).

Manually replacing a DLL in the GAC

When I manually deleted my June CU Microsoft.Office.Excel.Server DLL from the GAC using GACUtil (as you shouldn’t do), and replaced it with a newer version from the December CU, I broke my Excel Services Service Application. When I ran the Product Version Job timer job it failed to detect the change (the new version was never reflected in Manage Patch Status). Everything looked exactly as it normally would in the application event log, except for this message immediately after the normal 1015/1035 entries:

The Execute method of job definition Microsoft.SharePoint.Administration.SPProductVersionJobDefinition (ID 9bb9d31b-7c8b-4fd7-b52d-5fec40aa3607) threw an exception. More information is included below.

Failed to call GetTypes on assembly Microsoft.Office.Excel.Server.MossHost, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c. Method ‘IsEditEnabledForCurrentUser’ in type ‘Microsoft.Office.Excel.Server.MossHost.MossHost’ from assembly ‘Microsoft.Office.Excel.Server.MossHost, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c’ does not have an implementation.

This error is informative, and would probably help me track down the issue in due course, so the Product Version Job is earning its keep, but it’s unfortunate that this version change is not displayed in Manage Patch Status in any way. In short: this is a good reason to run the job but it’s also good to know this kind of problem won’t appear in Manage Patch Status.

Manually Replacing a DLL version in the Program Files directories

Next, I tried to manually replace DLLs in the Program Files directories with newer versions. I searched throughout the Hive and the C:Program FilesMicrosoft Office Servers14.0 directories for other versions of these files. I was working on the assumption that the version in the GAC would be in use (thanks to Chris O’Brien for this advice), but I wanted to see if the job would successfully spot changes in these Program Files locations, since this is what the Microsoft Support response suggested.

I found the same InfoPath DLL and a differently-named Excel Services DLL in these locations:

  • C:Program FilesMicrosoft Office Servers14.0BinMicrosoft.Office.InfoPath.Server.dll
  • C:Program FilesMicrosoft Office Servers14.0Binxlsrv.dll

I ran the Product Version Job after deleting these files and rebooting. Again, the job failed to detect the changes.

What happens with added DCOM Local Activation rights?

If the farm account has DCOM Local Activation rights on the Windows Installer Service, it resolves the DCOM error event log clutter, but these rights don’t impact whether the job can detect these changed DLLs.

Killing an installation part-way through

Next, I rolled back to a stable state and ran the December Cumulative Update against a June Cumulative Update installation. At a random point during the installation I killed the installer (not the Products Configuration Wizard). While the installer was running I wasn’t able to monitor activity in ULS Viewer because SharePoint was being patched. However, I was looking at the dbo.ServerVersionInformation table in SQL Management Studio and I could see new rows with updated versions appearing as it progressed. The Cumulative Update installer was writing to the same table that the Product Version Job updates.

Running the Products Configuration Wizard after fixing the failed installation

Later, I fixed up my December CU installation and ran the Products Configuration Wizard. When it was running, I could see that something very similar to the Product Version Job was logged. The same informational events (1035) appeared successfully in the application event logs, without any DCOM errors or “Failed to Connect to Server” (1015) application event log warnings. Presumably this succeeds (with or without the DCOM rights) because the Setup account that’s running the wizard is a local admin and therefor already has the DCOM Local Activation rights. However, I’m not sure what’s gained by updating Manage Patch Status at this point, since the dbo.ServerVersionInformation table was already updated by the installer. I won’t dwell on that thought too much though, since there may be a very good reason for the update at this time.

For those who are interested in the workings of this update, it’s worth noting that the Products Configuration Wizard appears to use the Microsoft.SharePoint.Administration.SPServerProductInfo.UpdateProductInfoInDatabase(Guid serverGuid) method. It effectively calls the same thing as the Product Version Job timer job, if I’m reading all of this right. A fuller glimpse of the ULS logs looks like this:

Updating SPPersistedObject SPServer Name=SPSQL. Version: 120278 Ensure: False, HashCode: 2459215, Id: 20c667df-1bc3-486b-869c-a3ba40f83af5, Stack:
at Microsoft.SharePoint.Administration.SPPersistedObject.BaseUpdate()
at Microsoft.SharePoint.Administration.SPServerProductInfo.UpdateProductInfoInDatabase(Guid serverGuid)
at Microsoft.SharePoint.PostSetupConfiguration.FinalizeTask.Run()
at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask()
at System.Threading.ExecutionContext.runTryCode(Object userData)
at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart()

It’s also worth noting that this log entry correlates with the MsiInstaller 1035 success events in the application event logs that I mentioned above.

What about deleting DLLs?

While investigating this, I ran all of this by my colleague Jalil Sear. He came up with an interesting idea: that I shouldn’t just update the DLLs, but I should try to delete them altogether. So I deleted Microsoft.Office.Excel.Server and Microsoft.Office.Infopath.Server from the registry and the GAC and reset IIS. I re-ran the Product Version Job and it completed normally, with and without DCOM Local Activation rights. Nothing was detected, although my entire Manage Service Applications page was annihilated. Again, we might have expected this to be reported in Manage Patch Status.

Summary of Test Results

  • The Product Version Job reports “Success” in the Timer Job Status, regardless of all of these considerations. It may fail for other reasons, but all of these issues obtain when the job reports a successful status. In other words, the job reports “success” with or without DCOM rights.
  • It’s not clear to what extent the Product Version Job can account for problems while the installer runs, because the installer already makes updates to the dbo.ServerVersionInformation table as it goes.
    • One might reasonably wonder what would happen to whatever was being updated while the installer failed. Obviously it’s hard to make broad statements about that when we don’t know at which precise point it failed, but in any case the remedial action will be to run the installer again – potentially after fixing something else. One way or the other, if you have this problem, I don’t see how the timer job is going to help because it’s unlikely it will be able to run against this server until the installation is fixed.
  • It’s also not clear to what extent the Product Version Job can account for issues that occur while the Products Configuration Wizard is running – effectively for the same reasons as above. If you have a problem with that wizard, the remedial action will be to fix the problem and run the wizard again.
  • Manage Patch Status doesn’t seem to account for other issues in the GAC or the Program Files directories, such as manual changes to DLLs. Presumably this is because these actions have been taken without using the Windows Installer Service.
    • Obviously, if you’re running an environment where these sorts of changes are routinely possible, then this job is a lesser concern than Change Management processes that might prevent these things from happening in the first place, but it’s worth knowing that the job did not detect these changes in my tests.
  • It’s not clear in which cases the Product Version Job is useful for recording the difference between product versions on different servers, since the installer should have already updated the dbo.ServerVersionInformation table.
    • One example where the job might be useful is the case where a server is restored to a pre-upgrade state. However, it’s likely that this restore operation will prompt some other remedy, like reverting all of the other servers in the farm or upgrading this server again. So the usefulness feels limited to me. Still, this is probably sufficient reason to run the job absent any other considerations.
  • The Manage Patch Status page is still useful for tracking differences across servers where the servers are legitimately running at different patch levels, although typically that’s not a state you’d want to run in for long.

Putting this information to use

I wouldn’t suggest reading this as the full story, since I only ran these against a single SQL/SharePoint box. At a minimum the Product Version Job can detect product version mismatches when a server is restored, and servers in long-term mismatched states. As a plus, it will throw an error in your application logs to let you know if there’s something wrong with the DLL that it expects in the GAC. Unfortunately, that isn’t reported to Manage Patch Status. In any case, as teams/farms increase in size this job becomes more useful for shared understanding.

At the end of this review, I think the important thing is to recognise the limits of the data in Manage Patch Status. It’s not going to be bullet-proof. For any actions taken with the Windows Installer, this data should be pretty reliable, since it’s updated during install, with the Products Configuration Wizard and with the Product Version Job. For anything else – who knows? It doesn’t appear to have been designed for that, and I have no idea what a SharePoint timer job would look like that could offer these kinds of assurances. Presumably it would have to be a management agent of some sort. At that point you’re in to Configuration or Operations Management territory and we already have different tools for that. Come to think of it, if you really want to know, “the install state of the machine“, that’s probably what you’re really looking for. But if you want to know the current versions of successfully-installed SharePoint Products on all servers in your farm, then Manage Patch Status should be accurate in most cases, because of the Product Version Job.

15 thoughts on “Testing Manage Patch Status”

  1. Thanks for your post; this has been very helpful in identifying why i was getting the DCOM errors and how to fix them.

    After fixing my farm to prevent the DCOM errors, i still seem to get the MSI Installer Warning Event (EventID 1015) in my log each time i run the Product Version Job. By following your fix should this problem have been resolved?

    Any help or suggestions on this would be greatly appreciated.

    Many thanks in advance

    Mark

  2. Hello – I have this resolved on the DCOM error side, but I am still getting 1035 and 1015 errors on my Farm. They are definitely running at 00:45, and can be replicated when the Product Version Job is run.

    Has anyone found a solution to this? 200+ errors and informational errors every night is quite annoying. Thanks!

  3. Hi James. I’ve spent a lot of time looking at this now. I’ve got as far as figuring out that it’s something about the Windows Installer when called by a DCOM user that doesn’t have administrative rights that causes the warning. Whether this means that the warning is inevitable without administrative rights or if I’m just overlooking something, I can’t really say. I still hope to crack this nut someday, but at present I’ve spent far too long on this and I may have reached the limits of what I can figure out based on what I know today. The only two remaining clues I’ve got right now are as follows:

    1) There’s a ULS log entry for, “patchca not found. Falling back”, which seems to correlate with these events, but I can’t find anything in the reflected SharePoint code that triggers this message, which leads me to believe this is generated by one of the Windows Installer Service’s internal methods. As I say, this *seems* to correlate with these events, but I’ve stopped pursuing this line of enquiry with any vigour because I also found that these messages seem to be triggered with or without local admin rights.

    2) I cranked the ULS logging up to “Verbose” and uncovered another message, “Begin invoke timer job Product Version Job, id {9BB9D31B-7C8B-4FD7-B52D-5FEC40AA3607}, DB n/a”, which also seems to correlate in some way, since this is getting called by SPMsi.MsiDatabaseQuery. But again, this message seems to get logged when running the job with admin rights as well.

    a) Note: I spent a lot of time looking at the Windows Installer Service’s “database” as well. Whether this is a reference to the registry entries that point at packages as described in my original posts, or whether the “database” is the Windows Installed stuff that’s contained in an executable is something that I’m struggling to answer, although I’ve come around to the idea that it must be the latter. Unfortunately the Windows Installer stuff is all very murky and difficult for a non-developer to understand.

    b) I tried granting the Farm account Modify, then Full rights on C:WindowsInstaller to see if it might be as simple as File System permissions, but that didn’t change anything.

    c) I also tried tweaking AppLocker Windows Installer Policies but had no joy there either. Part of what I did there had to do with loosening security on “C:Program FilesCommon FilesMicrosoft SharedSERVER14Server Setup Controller”, but again, I had no joy.

    I also tried picking all of this apart with Process Monitor but that is a truly overwhelming undertaking with this quantity of information. And basically I failed to pick out any meaningful patterns where the results were not “SUCCESS”.

    I have some more detail about other things that didn’t work to-date, and some screen shots to support this stuff, which I can document in another blog post if it’s useful. Please let me know if that would be of any use. I’m still kind of clinging to the hope that I might figure this out some day, while partly resigning myself to the possibility that this may just be how the Windows Installer works, since it kind of makes sense that it wasn’t really designed to work without admin rights.

  4. Hi Tristan,
    Thanks for your answer, I read that first before commenting this article, but the confusing is 2 from 4 server are actually succeeded.

  5. Hmmm… Yeah that is confusing. At a guess, I would suggest something permissions-related, or maybe even to do with the Timer Service full-stop, but I’m afraid I don’t have any specific insight in to this one.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.