I was recently working with a Hyper-V VM that had a large branch of snapshots that I wanted to clean up, in order to conserve disk space. This was a SharePoint 2010 development VM which I’d configured specifically for a project, so I didn’t need all of the earlier snapshots. The environment has two VMs (one domain controller, everything else on the other), so I deleted all of the snapshots that I needed to get rid of on the first VM, one-by-one. From previous experience I knew that I could delete multiple snapshots before the initial merge operation completed. Hyper-V creates a queue of the merge operations that need to complete before the virtual machine can be restarted again. I left myself with only the latest snapshot and moved on to the second virtual machine to do the same. At this point I got a little too clever and started deleting the second snapshot before the first snapshot deletion was queued. It usually only takes a few seconds to complete but I jumped the gun and Hyper-V Manager threw two errors (4096 and 16410) regarding Virtual Machine file access when I tried to delete the second snapshot.
After that I tried to delete other snapshots but I kept getting errors and the VM entered a Saved-Critical state. This will happen when Hyper-V Manager cannot access a file system location or cannot find a file, for instance when a removable hard drive is pulled out.
Approximately 30 seconds later, Hyper-V thought that it regained access to the location:
However, I couldn’t get any snapshots to delete and the virtual machine wouldn’t start. After a few minutes of panicked clicking I decided to restart the Hyper-V services. When they came back up my VM disappeared. The Virtual Machine configuration file was corrupted.
The next event suggests that a snapshot file was also corrupted.
These 16310 and 16330 errors repeated for a while. Panic continued. Eventually I rebooted. On reboot the VM was still missing and the 16310/16330 errors persisted.
On a hunch I decided to see what the AVHD files (the differencing disks that correlate with snapshot states) looked like.
This looked very much like what I would have expected if nothing had gone wrong (and if none of the snapshot deletions completed). Sticking with this line of inquiry (and what the 16310 error suggests), I created a new virtual machine and pointed it at the most recent AVHD file (selected above). All of my snapshots were missing but the virtual machine created successfully. I started the virtual machine and it was clearly in the same state it was in before I took the most recent snapshot, with a few caveats. In my panic I forgot to re-create my second NIC, so the VM started with only one (the one that I specified when I created the new VM). I also forgot to give it a second CPU. So I shut down the VM, made these changes, restarted, reconfigured the second NIC and tested that everything worked to my expectations. Recovery complete, so I shut down both VMs again.
At this point I’d recovered the VM but I still had a bunch of unnecessary data in my branch of differencing disks. In order to clean this up, I took a new snapshot of both VMs and exported the latest snapshot of each of them. This merged all the differences across the AVHD files in to a new, self-contained VHD file. After the exports finished I deleted the old VMs, waited for the Destroy operations to complete, cleaned up lingering files on the file system and imported the new exports. I took a new snapshot, as this is my new stable starting point and everything was (relatively speaking) back to normal. Phew!
With hindsight, I would have handled the recovery as follows:
- Create new VM, pointing at latest differencing disk (or whichever snapshot state you wanted to preserve).
- Reconfigure processors.
- Reconfigure network adapters in Hyper-V Manager.
- Start the virtual machine.
- Reconfigure NICs in the guest.
- Test everything is working as expected in the VM.
- Shut Down.
- Delete old VM from Hyper-V Manager.
- Wait for the Destroy operation to complete.
- Delete any lingering files from the file system.
- Import the exported virtual machines.
As I hinted at above, having gone through this process, it occurred to me that you could probably point at whichever AVHD file you wanted to, if you didn’t want to use the latest snapshot, assuming none of the AVHD files were corrupted. In this case it was just the virtual machine XML file and possibly the snapshot file that were corrupted, rather than the VHD file and differencing disks (AVHD files) themselves. The problem would be identifying which AVHD file corresponds to the snapshot that you want to keep, but in principal I think this would work.
I should note that this is probably unsupported, but you’re not really losing anything because otherwise you would have only been able to recover the first VHD file. This technique wouldn’t be much use if you didn’t know which snapshot you were after or if you wanted to recover the entire snapshot tree, but this fix gives you some recovery where the virtual machine file and the snapshot tree are corrupted but the disk data is not.