I have to admit I was a little surprised when I found out from EMC that SRM does not support Recoverpoint Point in Time fail over.
Funny how VMware couldn’t tell me this? They just passed the buck to EMC… typical!
What’s the point of purchasing a product like EMC Recoverpoint if you cannot use it to its full potential? Well, that’s not entirely true, you can, you just have to do it outside of SRM!
Maybe SRM 5 has spoilt me a little. It does what it is supposed to do extremely well, but I just assumed the SRM RecoverPoint SRA would integrate with RecoverPoint’s image access and allow you to pick a previous point in time if required during a failover.
Alas, this is not the case. You cannot pick a point in time with SRM, it always uses the latest image only.
This is a feature request for SRM, VMware employees if you are reading this: When I perform a Test Failover I would like the ability to pick a previous point in time if required, before the failover commences.
What if you have to performed a Disaster Recovery failover and your latest image is corrupt. How do you then roll back to a previous journal entry in your Recovery Site?
These are some of the scenarios I don’t quite fully understand and I’m going to do some testing to see if I can combine some SRM and RP steps to at least partially automate the process – the thought of using RP natively, enabling image access, mounting LUNS in the recovery site, rescanning hosts, registering VMs in vCenter, etc has really put me off using RecoverPoint’s point in time features.
More on this to follow.
I recently encountered an issue where we ran a SRM Test Failover and afterwards it failed to cleanup correctly.
When the cleanup operation fails what I normally do is run the Force Cleanup and continue on with my life. How wrong I could be…
What happened next is I ran a planned migration and because the force cleanup had not worked correctly, not all virtual machines were protected. When the storage failed over, only 3 of the 8 VMs powered up in the Recovery Site. We ended up in a SRM failed state and had to manually failback the storage and reinstall SRM. It was a complete disaster and a big waste of a weekend.
So… this post outlines what you should do when a cleanup operations fails… As usual I learnt the hard way…!
If a cleanup operations fails:
- Run the force cleanup to try and finish the cleanup operation.
- Once Force Cleanup completes, check the following components manually to confirm that the force cleanup completely successfully.
- Open the Protection Group in SRM and open the protection group status for the virtual machines.
- Select refresh and confirm all VMs are still protected – there status should be ‘OK’
- If any are not OK, select Reprotect VMs to fix the issues and recreate the placeholder VMs
- Change to vcenter datastore view
- Confirm the snap datastore for the Test Failover has been removed
- If the snap datastore still exists in italics or normal text, manually unmount and detach the snap datastore from all hosts.
- Once the datastore has been unmounted and detached from all hosts, right-click the datacenter (DC1 or DC2) and execute a ‘Rescan for Datastores’.
- On the next screen, untick ‘scan for new storage devices’
- Confirm the snap datastore has been removed.
And now you can carry on with your life…. and your planned migrations.