In a server administrator’s never-ending battle with log clutter, DCOM errors have proven to be some of the most persistent and poorly-understood events – especially with SharePoint. Our community has been building up remedial practices for the most common of these errors, but changes to the number and complexity of these fixes over the last few years call for a deeper look at what we’re changing, and the effects of these changes beyond a reduction in red and yellow icons in the event logs. In this post I’ll talk about some of the fundamental concepts from a Systems dude’s perspective and along the way I hope to convey a better understanding of Windows itself.
History
In the past, SharePoint DCOM errors were mostly confined to problems with the IIS WAMREG DCOM component and we didn’t need to spend much time considering the fix, because the errors merely generated clutter:
You can safely ignore the event ID error messages 10017 or 10016 that are logged in the System log. If you want to prevent the event ID error messages from being logged in the System log, use the Component Services snap-in to enable the Local Activation permission to the IIS Wamreg Admin Service for the domain user account that you specified as the Windows SharePoint Services 3.0 service account.
Specifically, these errors would occur when application pool identities or service accounts did not get added to the WSS_WPG or WSS_ADMIN_WPG local security groups properly, or in other cases these groups would not receive the correct rights on the DCOM component itself. A similar issue cropped up with the oSearch DCOM component at one point in the MOSS 2007 lifecycle as well (I think it was circa Service Pack 1 or the Infrastructure Update), although I’ve not come across this in a long time. This was all pretty easy to fix and has been well documented for many years.
Then, not long before the SharePoint 2010 public beta was released, Windows Server 2008 R2 launched and people began to notice that the options for modifying these permissions were greyed out in the Component Services Snap-In. This is a Windows Server 2008 R2 issue (although for many, I think this fact may have been lost in the tempo of releases at the time). The Trusted Installer now owns these permissions, meaning that local administrators need to claim ownership of the component in the registry and add Full Control permissions there before the DCOM component’s Launch and Activation permissions can be modified. It’s worth looking at what TechNet has to say about the TrustedInstaller SID, to understand why this isn’t madness (and actually kind of sensible).
In the Windows Server® 2008 and Windows Vista® operating systems, most of the operating system files are owned by the TrustedInstaller security identifier (SID), which is the only SID that has full control over them. The purpose is to prevent a process that is running as an administrator or under the LocalSystem account from automatically replacing the operating system files. To delete an operating system file, you need to take ownership of the file and then add an access control entry (ACE) on the file that permits you to delete it. This helps protect against a process that is running as LocalSystem and has a System integrity label; a process that has lower integrity should not be able to elevate itself to change ownership. Some services, for instance, can run with medium integrity, even though they are running as LocalSystem. Such services cannot replace system files, thereby preventing an exploit that takes over a service from replacing operating system files.
In other words, this is a mitigation against increasingly sophisticated exploits. By default, the Trusted Installer is the only account with permission to modify most of the operating system. We can also infer that these DCOM components had not been reassigned to the Trusted Installer in Windows Server 2008, so we only encounter this requirement on Windows Server 2008 R2 machines. All told, this is easy enough to fix and it only adds one extra step. Yet that’s just the “how” and it doesn’t really speak to whether making these changes is sensible or not. As I read the SharePoint IT Pro community, there’s still a deficit of understanding when it comes to DCOM itself, so I’ll try to quickly fill that hole here.
What is DCOM?
First, we should ask, What is COM?
Microsoft COM (Component Object Model) technology in the Microsoft Windows-family of Operating Systems enables software components to communicate. COM is used by developers to create re-usable software components, link components together to build applications, and take advantage of Windows services.
The key is that we’re talking about components. These are objects that can be assembled together in order to build an application. Within this model, Microsoft specify DCOM, which is a set of Remote Procedure Call (RPC) extensions for communications.
The Distributed Component Object Model (DCOM) Remote Protocol is a protocol for exposing application objects by way of remote procedure calls (RPCs). The protocol consists of a set of extensions layered on Microsoft Remote Procedure Call Protocol Extensions as specified in [MS-RPCE].
Note The DCOM Remote Protocol is also referred to as Object RPC or ORPC.
DCOM is how these COM objects communicate with each other, enabling applications to be assembled from distributed components. Distribution is important because in the SharePoint world we’re typically talking about mulitple servers in a farm and they often need to initiate actions on another server. We’re always talking about modifying DCOM configuration for a single server, since that’s where the component lives, but these configuration settings need to be made on all machines that serve the component in question, and the settings need to be suitable for everything that each server does. These settings should also prevent the components from being called if doing so would be unsuitable. Remember that each DCOM component can be called by anything that has access to do so, locally or remotely. To this end, we need to take a closer look at how this access is controlled.
Security in DCOM
Before Windows Server 2003 SP1, COM applications were vulnerable to unauthenticated attacks because there was no access check when a component was called, activated or launched.
“What new functionality is added to this feature in Windows Server 2003 Service Pack 1?”
A change has been made in COM to provide computerwide access controls that govern access to all call, activation, or launch requests on the computer. The simplest way to think about these access controls is as an additional AccessCheck call that is done against a computerwide access control list (ACL) on each call, activation, or launch of any COM server on the computer. If the AccessCheck fails, the call, activation, or launch request will be denied. (This is in addition to any AccessCheck that is run against the server-specific ACLs.) In effect, it provides a minimum authorization standard that must be passed to access any COM server on the computer. There will be a computerwide ACL for launch permissions to cover activate and launch rights, and a computerwide ACL for access permissions to cover call rights. These can be configured through the Component Services Microsoft Management Console (MMC).
These computerwide ACLs provide a way to override weak security settings specified by a specific application through CoInitializeSecurity or application-specific security settings. This provides a minimum security standard that must be passed, regardless of the settings of the specific server.
One thing that may not be evident from this description at face value, is that these security settings allow a system administrator to grant access to these components from anything that generates the request. It puts some of the application security controls in the hands of the system administrator, which makes sense for a component that can be called by many applications. This is important to understand. Changing DCOM access rights has system-wide effects, and in the SharePoint world we tend to be poking around in the Component Services MMC to have a look at IIS settings or potentially something as broadly used as the Windows Installer Service. These are not settings that only effect a single SharePoint web application – they effect all current and future workloads on the system.
To put this in perspective, in most cases a SharePoint server isn’t doing a great deal beyond being a SharePoint server, so this stuff was a bit more important back when workloads were more commonly mixed. But we should still understand the implications of these changes when we make them – which is something I rarely see discussed in the SharePoint world.
At this point I could dive deeper in to these inner workings – particularly when it comes to the IIS WAMREG DCOM Component – but I think this is the extent of the useful perspective that I can offer before I need to rely heavily on content that was written before I saw my first DCOM error. I did spend some time ingesting that stuff along the way though, so I’ve included some of those links at the bottom of this post in case they are of interest to anyone else. There’s some cool history in here, back to IIS 3.0!
In my next post, I’ll move on to DCOM errors in SharePoint 2010 that I haven’t written about (or at least not since it was in Beta), and consider how they relate to the way we respond to the Product Version Job errors.
Further Reading
- What is COM?
- The DCOM Protocol Specification
- The rest of the DCOM Security Enhancements article (linked above) is great for a deeper induction in to COM security, but keep in mind, some of this has changed in the last six+ years.
- DCOM fundamentals from an AD/network perspective: RPC over IT/Pro.
- The best description of WAM that I can find, which may be interesting to people who want to know more about the most common DCOM IIS WAMREG errors, and where WAM comes from: Web Security: Part 2: Introducing the Web Application Manager, Client Authentication Options, and Process Isolation.
- As I said earlier, everything I’ve written here comes from a Systems dude’s perspective, focusing on what that means for SharePoint. If’ you’re interested in COM from a developer’s angle, Jeff Moser’s article is an introduction to the absolute minimum a COM developer needs to know. Please keep in mind that I am pretty far from the right person to assess the quality of developer content though. Finally Understanding COM After Changing a Light Bulb.
Thanks Tristanfor sharing the link to Jeff Moser’s excellent post, what a time saver that is!
No worries. Glad it’s helpful.