vBlock Tip: vSphere 5, SIOC, EMC FAST VP and Storage DRS


If you’ve got a vBlock its most likely you’ve got an EMC array with EMC FAST VP and hopefully by now you’ve upgraded to vBlock Matrix 2.5.0 and you’re using vSphere 5.

If not, what are you waiting for? Oh yeah, there are still a few outstanding issues.  (My advice wait for the Storage vMotion issues to be resolved, its a real pain.)

I wanted to post some best practices and recommended settings for leveraging VMware’s Storage IO Control with EMC Fast VP and Storage DRS.

First a quick recap:

  • FAST VP is EMC’s sub LUN auto-tiering mechanism.
  • SIOC is VMware’s attempt to leverage the idea of DRS (distributed resource prioritisation) into the storage layer. SIOC  provides I/O performance monitoring and isolation of virtual machines in vSphere 5.
  • Storage DRS is a new feature in vSphere 5 which allows datastores to be pooled together as a single resource.

The bottom line: EMC FAST VP and SIOC are not only compatible but can work together harmoniously because they serve different purposes.

EMC FAST monitors data usage over an hourly period and only moves data once every 24 hours. Unlike SIOC, EMC FAST redistributes data based on the 1GB slice usage and lowers the response time of the busiest slices.

Compared to EMC FAST, SIOC uses a relatively short sampling window and is designed to quickly deal with short term IO contention crises. It can act quickly to throttle IO to limit guest latency during times of IO contention.

SIOC and EMC FAST perform complementary roles to monitor and improve storage performance, therefore they should both be leveraged in your environment.

And lastly Storage DRS – should it be used — yes, but  in what capacity?

My recommendation is to leverage Storage DRS in Automatic mode for initial placement to balance VMs evenly across datastores. I would also enable SDRS to monitor free capacity to make VM relocation recommendations if datastores approach capacity. The default setting is 90% which should be adequate.

What should be disabled though is IO Metrics — It is EMC’s recommendation Storage DRS IO metrics be disabled when using FAST VP. This is because they will perform competing roles, potentially identifying similar relocations and cause inefficient use of storage system resources.

So there you have it. The best way to leverage these components in your vBlock.

Sources:

There is a great EMC document here which lists best practice with EMC VNX Storage and vSphere and an old, but relevant article from Virtual Geek on SIOC and auto-tering.

Advertisements

SIOC is fully supported by SRM 5…er…except for planned migrations!


I was having some issues with SIOC and SRM 5 Planned Migrations. I noticed that my planned migration were failing when SIOC was enabled.

This got me thinking to whether SIOC is even supported with SRM 5, but I couldn’t even find documentation online whether it was supported. Looks like any mention of it has been omitted from the official documentation.

So after a bit of digging here is what I’ve found from VMware:

1) SIOC is supported for use with SRM – you can use SRM to protect SIOC enabled datastores

2) to execute a “planned migration” with SRM – you will need to disable SIOC first (on the datastores). You can not do a “planned migration” with SIOC enabled.

Lets start with the good news — SIOC is supported by SRM 5, so you can leave it enabled on all your replicated datastores.

This leads us to Point 2 – There are a few caveats:

As per KB2004605, you cannot unmount a datastore with SIOC enabled. If you are going to initiate a Planned Migration, you need to disable SIOC first on your protected site (active) LUNS. This is because SRM needs to unmount the active LUNS before it breaks the mirror and sets the read-only LUNS in your Recovery Site to read-write and mounts them on all ESXi hosts.

If you attempt a Planned Migration without disabling SIOC, the unmounting of LUNS and therefore the Planned Migration will fail.

There are other instances where a mounted datastore would need to be unmounted. Consider the following scenario, I haven’t had a chance to test this, but this is what I think will happen:

  1. For whatever reason your protected site (DC1) goes offline.
  2. Login to SRM at your Recovery Site (DC2) and initiate your Disaster Recovery plan
  3. The Protected (DC1) site array is unavailable so SRM is unable to synchronise changes, but it continues the recovery –
  4. SRM Instructs RecoverPoint\SRDF to break the mirror and convert the read-only recovery site (DC2) LUNS to read-write and SRM mounts them in vCenter.
  5. SRM powers on your VMs. Job done!
  6. But wait, the old protected site (DC1) eventually comes back online.
  7. You login back in SRM and hit Reprotect to start replicating back the other way
  8. SRM tries to unmount the LUNS in vCenter in DC1 before it begins replication back the other way but cannot because SIOC is enabled.
  9. The reprotection fails.

It seems clumsy to me that SRM isn’t aware of SIOC – It doesn’t matter whether it’s during a planned migration or a re-protect, if you have to keep disabling and re-enabling it’s a pain in the arse.

Clearly this isn’t going to happen a lot once you go live and its an annoyance at best, but this is the sort of minor issue that a polished product like SRM 5 shouldn’t have. Maybe I’m being so critical because it is such a good product now – they’ve raised my expectations!

I’ve raised a feature request with VMware to have this automated in a next release and I’ve been told the documentation will be updated to ‘state the obvious’.

Maybe I am blissfully ignorant of the complexity involved but as an enterprise end user it looks like a gap to me that needs fixing.

Manual steps introduce uncertainty and risk and this looks like an issue that should be solved.