vBlock Tip: vSphere 5, SIOC, EMC FAST VP and Storage DRS


If you’ve got a vBlock its most likely you’ve got an EMC array with EMC FAST VP and hopefully by now you’ve upgraded to vBlock Matrix 2.5.0 and you’re using vSphere 5.

If not, what are you waiting for? Oh yeah, there are still a few outstanding issues.  (My advice wait for the Storage vMotion issues to be resolved, its a real pain.)

I wanted to post some best practices and recommended settings for leveraging VMware’s Storage IO Control with EMC Fast VP and Storage DRS.

First a quick recap:

  • FAST VP is EMC’s sub LUN auto-tiering mechanism.
  • SIOC is VMware’s attempt to leverage the idea of DRS (distributed resource prioritisation) into the storage layer. SIOC  provides I/O performance monitoring and isolation of virtual machines in vSphere 5.
  • Storage DRS is a new feature in vSphere 5 which allows datastores to be pooled together as a single resource.

The bottom line: EMC FAST VP and SIOC are not only compatible but can work together harmoniously because they serve different purposes.

EMC FAST monitors data usage over an hourly period and only moves data once every 24 hours. Unlike SIOC, EMC FAST redistributes data based on the 1GB slice usage and lowers the response time of the busiest slices.

Compared to EMC FAST, SIOC uses a relatively short sampling window and is designed to quickly deal with short term IO contention crises. It can act quickly to throttle IO to limit guest latency during times of IO contention.

SIOC and EMC FAST perform complementary roles to monitor and improve storage performance, therefore they should both be leveraged in your environment.

And lastly Storage DRS – should it be used — yes, but  in what capacity?

My recommendation is to leverage Storage DRS in Automatic mode for initial placement to balance VMs evenly across datastores. I would also enable SDRS to monitor free capacity to make VM relocation recommendations if datastores approach capacity. The default setting is 90% which should be adequate.

What should be disabled though is IO Metrics — It is EMC’s recommendation Storage DRS IO metrics be disabled when using FAST VP. This is because they will perform competing roles, potentially identifying similar relocations and cause inefficient use of storage system resources.

So there you have it. The best way to leverage these components in your vBlock.

Sources:

There is a great EMC document here which lists best practice with EMC VNX Storage and vSphere and an old, but relevant article from Virtual Geek on SIOC and auto-tering.

Advertisements

vBlock Tip: Set VMFS3.MaxHeapSizeMB to 256MB on all ESXi hosts


This is the sort of issue I came across this week that you would expect VCE to make a de facto standard in the vBlock.

Why? Because the performance hit is negligible (a slight increase in additional kernel memory of 64mb), vBlock customers are likely to hit this ceiling and its another setting that we then don’t have to worry about.

I started running into issues vMotioning two VMs. It turns out this is a known issue as per KB1004424.

I was told by VMware: ‘ESXi 5.0 Host(Source on which you are trying to power on the VM)  already has 18 virtual disks(.vmdk) greater than 256GB in size open and you are trying to power on a virtual machine with another virtual disk of greater than 256GB in size.’

The heap size is effectively the amount of VMDK storage that can be hosted, across all virtual machines, for a given host.

The default heap size is a value of 80. To calculate the amount of open VMDK storage on the host that is available,  multiply 80 x 256 * 1024 — A 80MB heap value extends to 20 TB of open VMDK storage on a host.

Increasing the size to 256 results in 256 x 256 * 1024 — A 256MB heap value extends to 64 TB of open VMDK storage on a host, which should be plenty.

I was provided the following instructions which I repeated on each host to fix:

  1. Login to the vCenter Server or the ESXi host using the vSphere Client.
  2. Click on the configuration tab of the ESXi Host
  3. Click on Advanced Settings under Software
  4. Select VMFS3
  5. Change the value of VMFS3.MaxHeapSizeMB to 256
  6. Click on OK
  7. Reboot host

After rebooting each host the problem was solved. That was an easy one, for once!

SIOC is fully supported by SRM 5…er…except for planned migrations!


I was having some issues with SIOC and SRM 5 Planned Migrations. I noticed that my planned migration were failing when SIOC was enabled.

This got me thinking to whether SIOC is even supported with SRM 5, but I couldn’t even find documentation online whether it was supported. Looks like any mention of it has been omitted from the official documentation.

So after a bit of digging here is what I’ve found from VMware:

1) SIOC is supported for use with SRM – you can use SRM to protect SIOC enabled datastores

2) to execute a “planned migration” with SRM – you will need to disable SIOC first (on the datastores). You can not do a “planned migration” with SIOC enabled.

Lets start with the good news — SIOC is supported by SRM 5, so you can leave it enabled on all your replicated datastores.

This leads us to Point 2 – There are a few caveats:

As per KB2004605, you cannot unmount a datastore with SIOC enabled. If you are going to initiate a Planned Migration, you need to disable SIOC first on your protected site (active) LUNS. This is because SRM needs to unmount the active LUNS before it breaks the mirror and sets the read-only LUNS in your Recovery Site to read-write and mounts them on all ESXi hosts.

If you attempt a Planned Migration without disabling SIOC, the unmounting of LUNS and therefore the Planned Migration will fail.

There are other instances where a mounted datastore would need to be unmounted. Consider the following scenario, I haven’t had a chance to test this, but this is what I think will happen:

  1. For whatever reason your protected site (DC1) goes offline.
  2. Login to SRM at your Recovery Site (DC2) and initiate your Disaster Recovery plan
  3. The Protected (DC1) site array is unavailable so SRM is unable to synchronise changes, but it continues the recovery –
  4. SRM Instructs RecoverPoint\SRDF to break the mirror and convert the read-only recovery site (DC2) LUNS to read-write and SRM mounts them in vCenter.
  5. SRM powers on your VMs. Job done!
  6. But wait, the old protected site (DC1) eventually comes back online.
  7. You login back in SRM and hit Reprotect to start replicating back the other way
  8. SRM tries to unmount the LUNS in vCenter in DC1 before it begins replication back the other way but cannot because SIOC is enabled.
  9. The reprotection fails.

It seems clumsy to me that SRM isn’t aware of SIOC – It doesn’t matter whether it’s during a planned migration or a re-protect, if you have to keep disabling and re-enabling it’s a pain in the arse.

Clearly this isn’t going to happen a lot once you go live and its an annoyance at best, but this is the sort of minor issue that a polished product like SRM 5 shouldn’t have. Maybe I’m being so critical because it is such a good product now – they’ve raised my expectations!

I’ve raised a feature request with VMware to have this automated in a next release and I’ve been told the documentation will be updated to ‘state the obvious’.

Maybe I am blissfully ignorant of the complexity involved but as an enterprise end user it looks like a gap to me that needs fixing.

Manual steps introduce uncertainty and risk and this looks like an issue that should be solved.