SharePoint 2010 Development Environment Performance Tests

As I indicated in my last post, I’ve been plundering the depths of SharePoint development productivity in recent months. Understanding the context established in that post is pretty essential to understanding what follows here. In a nutshell, I’m trying to improve system performance for current users of our SharePoint development environment. This is not as simple as examining the Windows Experience Index on a number of laptop models. I needed to consult with our users to identify which tasks are slow for them and devise tests that would allow me to measure system performance on different physical and virtual systems. In this post I will describe the systems, the tests and the testing process before reviewing the results.

The Tests

The 21 tests that we settled on were the result of discussions with a number of the core developers, consultants and architects at Content and Code, plus a few tests that I threw in to confirm/disconfirm some of my suppositions, such as the impact of the User Profile Service Connection on first page load time. All 21 tests were run three times for each permutation of hardware candidate and virtualisation technology. We also tested on Amazon EC2. I will discuss the testing process in more detail in a moment.

These tests have been selected for a few reasons:

  • They are tests that anyone can run, including Visual-Studio-allergic types like myself.
  • They re-enact real-world productivity loss. All tests needed to be significant on our current system or they were thrown out.
  • They needed to account for tasks that impact non-developers as well as people that have their head down in code 40 hours/week.
  • They needed to be examples of tests that would stress systems in different ways.

First page load tests
These tests were designed to examine what, if any impact different sets of features, functionality and structure might have on first page load times after the application pool is recycled or IIS is reset (while gathering a large set of data to make comparisons across systems). I also wanted to fully validate my preliminary findings about the User Profile Service Connection.

I ran these tests against NTLM-authenticated web applications with the following root site collections:

  • Central Administration
  • Blank Site
  • MySite
  • Blank Site, with no User Profile Service Connection
  • The Content and Code website solution (structure, without content)
  • A custom intranet solution (structure, without content)

All of these first page load tests were repeated for application pool recycles and IIS resets.

End-to-end site creation to debugging tests
I hope these tests are fairly self-explanatory. I used the Content and Code website solution because it’s a public site that people can examine if they want to understand more about the structure of the solution and the scope of customisation tested here.

  1. Create new NTLM-authenticated web application from the GUI
  2. Create new Publishing Portal Site Collection from the GUI, at the root of the new web application
  3. Deploy Content and Code website solution from Visual Studio
  4. Delete the publishing site collection (this was a necessary step, but not a test that I timed)
  5. Create Content and Code website (structure, without content) from the GUI
  6. Debug Content and Code website solution in Visual Studio

Core development tests
These tests were added to account for pure development activity for large projects with lots of dependencies. We turned Code Analysis on for the first test because this is a feature that’s very useful but taxes systems pretty heavily. The code deployment times were all fairly small relative to other tests here, but we need to keep in mind that this could be repeated literally hundreds of times per-day. Note: full deployment is accounted for above in the end-to-end test.

  • Rebuild Large Project w/Code Analysis
  • Deploy Large Project to GAC/BIN

Disk/IO tests
These tests were thrown in because they have an impact on productivity even if they aren’t particularly routine. For the first test I measured the time from turning on the VM until the desktop rendered after logging on. The second test doesn’t really meet the “real world” criteria I name above, but it is a task that can be a productivity barrier in some cases.

  • Time to desktop
  • Run full crawl (three web apps, no content)

The Testing

The testing process was entirely subject to personal fallibility, as I carried these tests out myself using fairly imprecise methods like a browser-based stopwatch running on my host system (I made sure not to time things inside the guest, where time can slip occasionally). I also went to great lengths to carry out these tests when the systems were performing optimally; I would run through all of the tests once before recording the first set of results. I felt this approach was the best way to discount random variance. The test results were largely very consistent, so I believe these efforts paid off. Obviously the down-side to testing in this manner is that real work is not carried out in a vacuum, but I don’t see any other way to come up with repeatable tests aside from measures like these. It’s what works for science, after all.

The Virtualisation Technologies

As I mentioned in my last post, I chose to limit the virtualisation technologies to a single technology from each of the types I described. I had to postpone testing against “local systems” due to time pressures. It was the option that fell off because we are unlikely to ditch virtualisation any time soon. It works well for us.

To reiterate here, the candidate technologies were VMWare Workstation 7.1, the Hyper-V role in Windows Server 2008 R2 and Amazon’ s EC2 IaaS offering (a Red Hat implementation of the Xen hypervisor). Again, there’s background for all of this in my last post.

What About the Server Room?

One thing I haven’t discussed in any detail so far is VDI or Remote Desktop services. I briefly touched on shared development environments, but I’ve not talked about hosted, individualised development environments. The reason we ruled this out is cost. While this would probably be the best-performing option, all other things being equal, the costs associated with providing this level of performance in the server room would be pretty enormous. For our purposes we might have exceeded power, cooling and weight limitations before we considered the costs of new blade centres and SANs. These costs would probably be even greater in the datacentre. In short, the same criticism applies to individualised hosted development environments as to shared environments: redundancy and resilience at this level is overkill given the associated costs. The data is not critical and anything that needs to be backed up can be stored elsewhere (like TFS).

Basically, people opt for VDI or Remote Desktop services because a mass of underutilised desktop systems can be heavily consolidated. These systems are not underutilised.

The Hardware Candidates

Dell XPS M1330
This is our current laptop model, upgraded with a 320GB 7200 RPM local hard drive and 8GB RAM. One of the serious options we’re considering is a laptop refresh, due to the age and fail rate of the graphics cards and motherboards on these models.

Dell Studio XPS 1645
This was the least expensive decent i7 laptop I could find for testing purposes, and a leading candidate as a replacement laptop. With an £833 (ex-VAT) starting price it could be bumped up to 8GB RAM for a little over £100 more via Crucial. It’s a very heavy laptop and the glossy shell does it no favours, picking up fingerprints within seconds of use. However, it comes wth a 1.6 GHz i7 processor, 500GB 7200 RPM disk standard, eSATA port and HDMI. No USB3. Basically, nothing here was an absolute deal-breaker for us if performance was good.

This is a barebones system with the following configuration/cost (as priced at

  • ASUS V6-P7H55E barebones System = £121.67
  • Intel i7 870 (8M Cache, 2.93 GHz) = £217.57
  • 4GB Corsair XMS3 DDR3 PC3-10666 (1333) Dual Channel – 4x£56.59 = £226.36
  • 1TB Seagate Barracuda SATA 3Gb/s, 7200rpm, 32MB Cache, 8.5 ms, NCQ – 3x£41.94 = £125.82
  • Adaptec 1220SA PCI-E RAID Card = £46.40
  • ASUS 512MB GeForce G 210 DDR2 NVIDIA Graphics Card = £27.71
  • Total = £768.58 (VAT-inclusive)

This system is configured with three internal 1TB hard drives and 16GB RAM. We needed to purchase the RAID card because the motherboard does not have an on-board RAID controller. The graphics card was necessary because there are no integrated graphics on desktop i7 processors (although there are for some i3 and i5 models). The disk configuration was variable, as this was one of the test scenarios. The assumption going in was that two disks would be configured in a RAID 0 stripe or a RAID 1 array, depending on performance outcomes. We would only stripe the disks if there was an obvious, significant performance gain. The third disk would be attached to the on-board SATA controller. I will discuss the recommended configuration in more detail later. Also note: the graphics card supports two monitors across any two of the three outputs, but not three concurrently. Finally, the ASUS V7-P7H55E is nearly identical in every respect. We went with the V6 based on availability.

Other laptop models
During preliminary testing we looked at the Lenovo W510, the Dell Precision 6500 and the Alienware M17x among others. All of these models were candidates that we never ruled out, but we didn’t have sufficient time with them to run the entire set of tests. However, these models had a reasonably similar configuration to the Dell Studio XPS 1645 and the Hyper-V tests we ran on these systems yielded similar results to our test model.

Other desktop models
Obviously a barebones system won’t appeal to everyone as a business solution, and it took me some time to persuade myself that it might be suitable for these environments. It wasn’t until I actually priced up this model and compared it to the comparable Dell T1500 (+~£600) and HP Z200 (slower than either model, and pricier) that I considered how it might work for us more seriously.

What am I examining, and not examining?

We have an old laptop, a new laptop, a new desktop and the cloud. Excepting the cloud (which is fixed), we’re permuting each of these hardware options with VMWare Workstation and Hyper-V test results. We’re then adding tests to examine the impact of spindle/bus speeds and the impact of adding/removing cores to these VMs. Ultimately, I wanted to quantify the productivity impacts of a change to our hardware and/or virtualisation technology as opposed to a change within our virtualisation technology, insofar as these tests could be decoupled.

I am not examining every virtualisation solution nor every hardware permutation but I do try to account for a number of these variables with these tests. I would love it if people carried out similar tests on their environments to help build knowledge in an area that’s hugely uninspected today. These are some of the other tests that I hope to revisit next year:

  • The impact of application pooling on first page load times. Preliminary tests suggested there might be a small impact, but nowhere near as significant as the User Profile Service Connection. This warrants further inspection.
  • The performance of “local systems” on this same hardware. As I mention above, these tests had to be de-prioritised, but I feel it would be worth identifying if there are any of these development-specific tasks where some, or all virtual technologies suffer.
  • While I am running tests against a number of disk buses and configurations, I did not get the opportunity to test SSD performance. Obviously a lot of people will want to know the impact of SSD on these timings, but unfortunately I won’t have an opportunity to inspect that until early next year at the earliest.
  • In some cases we work with deep snapshot trees. I want to gain an understanding of how differencing across ten or more files impacts performance for these tasks.
  • Compare performance of a higher-clocked i5 to a lower-clocked i7 at a similar price range and potentially explore over-clocking options.
  • Compare slower memory on an otherwise-identical system.
  • Run VirtualBox tests on an otherwise-identical system.
  • Assess the impact of virtualisation optimisations.

Obviously these tests say nothing about the usability of the system, power costs, mobility and more. For the purposes of this post I’m only concerned with outlining how I tested system performance for these real world tasks. In the next post, at long last, I will share the results.