I’m presently running some quite methodical SharePoint 2010 development environment performance tests, as we’re finding that the Dell XPS M1330 we’ve been using for the last few years doesn’t really cut it in some scenarios. This has been an on-going issue for some time where I work, but it’s only recently been prioritised at the top of my workload. That it is now my top priority should give some indication how important these issues are for any company that spends significant time customising SharePoint. I’ll be discussing this wider project in more detail once I’ve finished my testing in the next couple of weeks, but for now I wanted to share a provisional finding about connecting Web Applications to the User Profile Service Application.
First Page Load After IIS Reset in SharePoint 2010
One of the key performance indicators I’m measuring is first page load after an IIS reset.
Why not just do an Application Pool Recycle?
Before going any further, I acknowledge that most developers will be able to save a lot of time by recycling application pools rather than resetting IIS – but there are still scenarios when a full IIS reset is required and we’re finding that first page load after an IIS reset is a great deal slower in SharePoint 2010 than it was in 2007. First page load has always been notably slow and people have written warm-up scripts to address this scenario post-reboot, but in SharePoint 2010 I’m noticing speeds are two or three times slower.
I’d initially hoped that I could use this long first page load time to my advantage, which partially explains the time I’ve spent working on this issue here. I was thinking, surely if it takes so long, that extra time will give me a more accurate measure of these performance indicators across different systems. However, as I started to test on server class hardware I was finding that the performance gains were by no means linear and much less than I would have expected. This also held true with i7 laptops, i7 desktops and Amazon EC2. Interestingly, it appeared that the CPU was in no way fully utilised on any of these systems when loading the page for the first time, and these timings did not improve by adding additional CPUs. Earlier tests suggested that disk speed was not a significant factor in first page load times and memory is in no way constrained during these tests.
The Speedy (but evil) White Wizard
In the second instance, we noticed that not all farms were as slow as most of them seemed to be. We stumbled across this accidentally when testing performance in Amazon Web Services (AWS). A colleague did our initial AWS work and we were both very impressed by the initial performance results. A few days later I joined in the fun and built my first single-server instance. We immediately noticed that my first page load times were approximately double the times that my colleague was seeing. Eventually we identified that he used the Farm Configuration Wizard while I had manually created a separate Application Pool for each of my Service Applications. This warranted further investigation.
A note about application pooling
My approach to creating a separate Application Pool for each Service Application is to some extent a hang-over from SharePoint 2007 least-privileged thinking. I was aware that this approach exceeded recommended Application Pool capacity limits, but I didn’t let this trouble me too much based on the single-user load; I’ve always prioritised adherence to the least-privileged model over minor performance degradation. However, based on these seemingly significant performance results, emerging community consensus and the best guidance available today, I decided to reconsider this approach.
A note about AWS
There are a number of broader architectural challenges to conquer when designing a SharePoint 2010 development environment in AWS, which is a topic that I hope to return to in a later post.
A note about the Farm Configuration Wizard
This is the page that greets you immediately after installing SharePoint. It takes care of a lot of Services and Service Applications in one go, but it does some pretty undesirable things as well. In short, for all but the most playful of applications, it’s not appropriate. Build the Service Applications properly.
Reconsidering Application Pooling
As mentioned above, my next step was to quantify the improvements that can be gained through pooling applications. My first test was to delete all of my Service Applications and re-create them in a single application pool. I also deleted all of the web applications and created them in a single, separate Application Pool.
Following my normal development environment build process, I created all of the Service Applications and the Web Applications before tackling the User Profile Service Application. Out of curiosity, I quickly tested first page load times and was happily surprised to find that they had been cut in half. So I took a snapshot and created the User Profile Service Application.
I Blame the User Profile Service Application
After creating the new User Profile Service Application and running an IISRESET, my first page load of Central Administration was almost exactly as slow as it had been with all the Service Applications in their own pools. This was before creating a synchronisation or doing anything with the newly created Service Application. It was merely provisioned. At my wits end, I called it a night.
Having thought about it some the next morning, I decided to create a new web application with a Blank root Site Collection. I already had a similar web application in my farm but I made one key configuration change to the new one. When creating the web application I created a custom Application Proxy Group and removed the Service Connection to the User Profile Service Application. I then tested first page load times on my two blank sites. The new site without the Service Connection to the User Profile Service Application loaded as quickly as the sites did before I created the User Profile Service Application. The original site loaded in the same time as the old sites. The disconnected site was approximately twice as fast to load.
Validating the Results
After reaching this provisional finding, I fired up the Microsoft Information Worker Demo VM. I wanted to test this on a completely different virtual machine (but on the same hardware). I created two new web applications with two new Blank root site collections. I ommitted the Service Connection to the User Profile Service Application on the second web application again. My timings were nearly identical to the timings on the first machines.
Next, I reverted to the earlier snapshot of my development environment – the one with each Service Application in a different Application Pool. I created a new web application with the Blank root Site Collection again and got nearly the same results. In this case, all of the results were slightly slower (a couple of seconds) than they were in my snapshot with all the Service Applications and Web Applications pooled together, but the Service Connection to the User Profile Service Application was a much bigger factor (~20 seconds).
What about the Farm Configuration Wizard Results?
You may be wondering why the sites on the Wizard-configured farm loaded quickly. While I’ve not spent any time revisiting that environment and I’ve never spent much time on servers configured by that wizard, I strongly suspect this is because the User Profile Synchronisation Service had never been successfully provisioned.
I am still in the process of further validating these results across various hardware configurations and within various virtualisation technologies. My tests should provide better data on the benefits of pooling the Service Applications as well. All of these findings are somewhat provisional, but I’d say the Service Connection results so far are the clearest findings I’ve got to date, by a considerable margin. In short, I think you can expect first page load times to be at least twice as quick when the Web Application is disconnected from the User Profile Service Application.
But I Kind of Need That Service Connection
Touché! You often will. In fact, let me back-track and say that I haven’t really considered how these findings can be applied in the real world yet. In previous development environment iterations, we found that we needed to abandon development in a Workgroup so we could connect to the User Profile Service Application. It may be that some web applications can live without this connection (for instance, many WCM apps), but as I say, my head is deep in performance considerations at the moment and I really haven’t had time to consider these implications yet. However, I will try to revisit the topic reasonably soon and I welcome comments! One way or the other, it’s good to have a better understanding of why 2010 first page load times are so much slower than 2007.
Does this slow anything else down?
At this point, I haven’t had a chance to test much else, but I have tested creating a new web application with and without this Service Connection. No impact. I also tried creating a Publishing Portal within those web applications, and again, no impact.
If you’re curious about the actual performance figures, I hope to publish them in the next couple of weeks. To give a high-level indication, in one environment the connected First Page Load times were ~35 seconds and the disconnected times were ~17 seconds. In slower environments this difference may be even greater.
Hey Tristan,
Good post. Here are my 2 cents:
– Farm Configuration Wizard is evil. It should not be used on a production env, and I discourage it on dev machines as well. I prefer developers to not run services they don’t need, and learn about provisioning them when they need it.
– On a dev environment it makes sense to use the same App Pool for various things. It will use less RAM. You will get slowed down with different app pools as in many cases they fire up SEQUENTIALLY. You hit the first w3wp, and in code your request calls a SA, so just after the first app pool you must start the second and so on. It would be good if you can include these tests in your work.
– I’m very keen to know what UPS does once it is in the Web App proxy group. Please post any findings.
Good post, thanks for sharing.
Radi A.
Cheers Radi. Yeah, I don’t use the Farm Configuration Wizard for anything. I think I used it the first time I installed during the Technical Preview and never again.
I also noticed that the application pools load sequentially, but I’ve not noticed a big impact on first page load times as a result. There is *some* impact but we’re talking about two or three seconds on the same environment where the User Profile Service Connection adds eighteen seconds after an IISRESET. However, I am going to test for this across different hardware in the near future.
At a minimum, I imagine the User Profile Service Connection is going to be used to render the link to a user’s MySite but also to be able to deliver any of the new social networking features. I think we’ll definitely inspect this issue in more detail but it may need to happen as people begin to test without this Service Connection, to start enumerating the specifics of missing functionality. Unfortunately, a lot of what the User Profile Service Application provides is some of the most compelling SharePoint 2010 features.
Hi Tristan,
Excellent blog! Its been two years since your original post, but Iam very curious as Iam at this stage now.
Have you had any further findings on this one? In my case, I get a better page load when I uncheck my web app from both the Metadata service and the UPS ( i have to uncheck both). This yields the first load response to about 22 seconds if not its about 90 seconds.
Second, Do you feel that WCF could be causing this delay? Any ideas on your workflow-eventdelivery-throttle & workitem-eventdelivery-batchsize parameters?
Thanks,
Jai
[email protected]
Hi Jai,
I’m not fully understanding all of your questions. I haven’t unpicked all of these settings myself. What I would say is that it’s worth checking to see if the CRL check could be causing the delay for you. Also it might be worth looking to see if anti-virus exclusions need to be configured. I’ve seen both of those things contribute to extremely high response times like you’re seeing, so it’s worth checking those first. But if you’re getting down from 90 to 22 by disconnecting these Service Applications then I reckon there’s something else underlying like the CRL check that’s causing the slow response. I doubt it’s a configuration within SharePoint that’s making such a big difference.
Cheers,
Tristan