HA / Distributed vSwitch problems after Storage vMotion scripts available for KB2013639


Anyone with a vBlock who has upgraded to vSphere 5, should have noticed by now that some virtual machines do not get restarted by HA. You can find out why here.

We first raised this problem with VMware a few months ago and have been waiting patiently for a root cause analysis. Initially we thought it may have been affecting virtual machines that were upgraded from VM v7 to VM v8, but as it turns out the issue is caused by SvMotion, and since we SvMotioned all our VMs from VMFS-3 to VMFS-5 datastores it affected all of our VMs.

You’ve gotta love VMware’s short term workaround – Do Not SvMotion. Not quite what I was expecting when I upgraded to vSphere 5. There have been, what feel like, a lot of schoolboy cockups in this release. Silly things like Virtual machine folders and files no longer getting renamed when you storage vmotion. Just feels plain clumsy.

Anyway, one of the big issues with this problem was that you’re not really sure which virtual machines (if any) are affected. Before the article KB2013639 (If that link doesn’t work try this one) was released, we followed the following steps to manually fix the problem on all our virtual machines:

  1. Connect the VM to another port group on the vDS
  2. Connect the VM back to the old port group on the vDS

Thankfully, there is now a script out to detect and fix virtual machines that are affected by the HA/DVS/Storage Vmotion. You can find William Lam’s copy here and Alan Renouf’s copy here. I’ve tested Alan’s script and it worked great, without any VM downtime.

This doesn’t stop virtual machine from being affected the next time you storage vmotion the VM. This only identifies which virtual machines will not restart correctly if a HA event is triggered.

For a fix you have to wait for vCenter 5 Update 2, which I believe is out in June, but if you have a vBlock I don’t have word yet which compatibility matrix this fix will be released with.

As usual we vBlock boys have to wait till last… and they’ll probably slip in some patches for Cisco and EMC in there too.

Wonderful!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s