I recently encountered an issue where we ran a SRM Test Failover and afterwards it failed to cleanup correctly.
When the cleanup operation fails what I normally do is run the Force Cleanup and continue on with my life. How wrong I could be…
What happened next is I ran a planned migration and because the force cleanup had not worked correctly, not all virtual machines were protected. When the storage failed over, only 3 of the 8 VMs powered up in the Recovery Site. We ended up in a SRM failed state and had to manually failback the storage and reinstall SRM. It was a complete disaster and a big waste of a weekend.
So… this post outlines what you should do when a cleanup operations fails… As usual I learnt the hard way…!
If a cleanup operations fails:
- Run the force cleanup to try and finish the cleanup operation.
- Once Force Cleanup completes, check the following components manually to confirm that the force cleanup completely successfully.
- Open the Protection Group in SRM and open the protection group status for the virtual machines.
- Select refresh and confirm all VMs are still protected – there status should be ‘OK’
- If any are not OK, select Reprotect VMs to fix the issues and recreate the placeholder VMs
- Change to vcenter datastore view
- Confirm the snap datastore for the Test Failover has been removed
- If the snap datastore still exists in italics or normal text, manually unmount and detach the snap datastore from all hosts.
- Once the datastore has been unmounted and detached from all hosts, right-click the datacenter (DC1 or DC2) and execute a ‘Rescan for Datastores’.
- On the next screen, untick ‘scan for new storage devices’
- Confirm the snap datastore has been removed.