VCE vBlock – 1st Year in Review


Well we have just passed a year of vBlock ownership and the last year has passed rather painlessly.

Our vBlock was one of the first out there, delivered in November 2011. I wanted to provide some pros and cons of vBlock ownership. Some of the themes are not vBlock specific, but worth bearing in mind because there will always be a gap between what you hear from pre-sales and what the reality is.

Pros:

VCE – The company has been constantly improving which is good to see. Not content to rest on their laurels, they really have grabbed the bull by the horns and they are innovating in a lot of areas.

vBlock – The concept of the vBlock itself deserves a mention. VCE are definitely on the right path… it’s like the first generation Model T Ford. I’m sure old Henry had hundred’s of suppliers that provided the components for his Model T and he came along with the assembly line production and he put it all together. This is like what is happening over at VCE. Over time I’m hoping that the integration between components will become more and more seamless as the demand for pre-configured virtualisation platforms grows and grows and the designers behind each of the components are forced to work closer together.

Management and Support – If you have a bloated IT support team in large sprawling organisation, a vBlock can help reduce your head count by simplifying your environment. One thing converged infrastructure platforms are good for, is breaking down the traditional support silos with regards to storage, network, compute, virtualisation. When all the components are so tightly integrated, your silo’d operations team morphs into one.

Compatibility Matrix – This has to be the biggest selling point in my book. Taking away the pain of ensuring compatibility between so many different components. The VCE matrix is far more stringent than individual vendor product testing and therefore far more trust worthy. Try getting a complete infrastructure upgrade over a single weekend across storage, network, compute and virtualisation components through your change management team. It’s not going to happen unless it’s been pre-tested.

Single line of support – Being able to call a single number when there is any issue, immensely simplifies fault finding and problem resolution. Worth it alone just for this and the matrix.

Single pain of glass – This is where UIMp is starting to come into its own. It’s been a long road, but the future looks good. VCE’s goal is to replace each of the individual management consoles so that VCE customers can use UIMp for all their automated provisioning. When it works, it really does simplify provisioning.

Customer Advocate – In my experience the customer advocate offers great value. Extremely useful when managing high severity incidents and ensuring your environment remains up to date and in support, with regular services reviews and providing an easy path into VCE to organise training sessions, bodies to fill gaps in support, provide direct line of contact to escalation engineers and just deal with any queries and questions you may have about your environment.

Cons:

The AMP – the major design flaw in the AMP for me is the 1GB network. Data transfers between VMs in our 10GB service cluster can achieve 300 Mbps; as soon as the AMP is involved it drops to 30Mbps. Really annoying and what is in the AMP? vCenter, which is used to import virtual machines. Let’s say you are doing a migration of 1000 VMs for example… that 30Mbps is going to get really annoying and it has.

Cost – The vBlock hardware isn’t so bad, but what really surprised me is the amount of and cost of the licenses. Want to add a UCS Blade? No problem, that will be £5k for the blade and about £3k for the licenses – UCS, UIMp, VNX, vSphere,  etc. It all adds up pretty quickly. Ensuring you adequately size your UCS blades up front, i.e. plenty of memory and CPU is really important.

Management & Support – Converged Infrastructure Platforms require a lot of ongoing support and management. This is an issue not limited to VCE. It’s just the nature of the beast. If you have  an immature IT organisation and have had a fairly piecemeal IT infrastructure and support team up until now, you will be in for a shock when you purchase a converged infrastructure platform. There’s no doubt a vBlock is an excellent product, but it’s excellent because it uses the latest and greatest, which can be complex. It also comprises multiple products  from 3 different vendors – EMC, Cisco and VMware, so you need the right skillset to manage it, which can be expensive to find and train. It takes at least a year for someone to become familiar with all components of the vBlock. You’re always going to have employees with core skills like virtualisation, storage, network, compute, etc, but you do want people to broaden their skills and be comfortable with the entire stack.

Integration between products – See above, multiple products from 3 different vendors. At the moment the VCE wrapper is just that, little more than a well designed wrapper, lots of testing and a single line of support. Ok, so EMC own VMware, but it seems to make little difference. EMC can’t even align products within their own company, how on earth can they expect to align products with a subsidiary?  If the vBlock is going to be a single vendor product, then all 3x vendors need to invest in closer co-operation to align product lifecycles and integration. VMware release vCenter 5.1 and Powerpath have to release an emergency patch to support it? Going back to my Model T analogy, the vBlock is never going to become a real Model T until Cisco buys EMC or EMC drop Cisco and start making the compute\network components. Not so far fetched.

Complexity – The VCE wrapper hasn’t changed the complexity. (This is the same with HP or Flexpod.) This is another myth. “We’ve made it simple!”. Er, no, you haven’t. You’ve just done all the design work and testing for us. Until the integration above takes places, which will allow for simplification of the overall package its going to remain just a wrapper and it’s still going to remain an extremely complex piece of kit. VCE have focused efforts on improving UIMp to simplify vBlock provisioning and to simplify vBlock management through a single interface but really these are just band aids if the individual components are made by separate companies.

Patching – Even though there is a compatibility matrix, which does the integration and regression testing for you, it still doesn’t take away the pain\effort of actually deploying the patches. Having a vBlock doesn’t mean there is no patching required. This is a common pre-sales myth, ‘Don’t worry about it, we’ll do all the patching for you.’ Sure, but at what cost? Security patches, bug fixes and feature enhancements come out more or less monthly and this has to be factored in to your budget and over time costs.

Monitoring and Reporting – This is a pain and I know there are plans afoot at VCE to simplify this, but currently there is no single management point you can query to monitor the vitals of a vBlock. If you want to know the status of UCS: UCS manager, VNX: Unisphere, ESXi: vCenter, etc. For example, you buy VCOps but that only plugs into vCenter, so you are only aware of what resources vCenter has been assigned. To get a helicopter view of the entire vBlock from a single console is impossible. UIMp gives you a bit of a storage overview: available vs provisioned, but does not give you much more than that. So you end up buying these tactical solutions for each of the individual components, like VNX Monitoring and Reporting. Hopefully soon we will be able to query a single device and get up to date health checks and alerting for all vBlock components.

Niggles – There have been a few small niggles, mainly issues between vCenter/Cisco 1000V and vCenter/VNX 7500 but overall for the amount of kit we purchased it has not been bad. I think a lot of these issues had to do with vCenter 5\ESXi 5. As soon as Upgrade 1 came out, everything settled down. Note to self don’t be quick up upgrade to vCenter 6/ESXi 6!

Advertisements

VCE release latest Compatibility Matrix with vSphere 5.1


With VCE announcing back in August that they were fully supporting VMware’s vCloud Suite 5.1, I guess its no surprise that the latest VCE matrix with vSphere 5.1 has just been released last week…

But the timing is a little surprising. I was not expecting VCE to release their compatibility matrix with vSphere 5.1 so soon after VMware’s GA release.

Normally they take a good couple of months after a major release to test product integration with all the vBlock components… Saying that it does look like the number of changes are relatively minor for a major release. Normally VCE would take the opportunity to refresh all the components across the vBlock, especially field notices for the Nexus and VNX, but it looks like the emphasis on this release has been to get vSphere 5.1 out ASAP. I’m guessing promises were made somewhere!

Saying that I would not recommend deploying this if you are a VCE customer until the vSphere 5.1 Update 1 is released, especially if you are in a large organisation where major upgrades can only be undertaken once or twice a year. New releases are always a little buggy and I doubt this will be any different… We had enough issues when vSphere 5.0 was released… the sort of minor problems I would rather not go through again. Not because of the business impact, just because of the time spent on the phone diagnosing and identifying faults.

Additionally major upgrades are disruptive and it makes sense to ensure the matrix you upgrade to won’t need to be upgraded for a good 6 months or so, especially since you know the next matrix release is going to contain all the updates they couldn’t squeeze into this one.

vBlock Tip: Increase Cisco 1000V Max Ports from default of 32


Another post in the vBlock tip series…

VCE use static binding on the Cisco 1000V and this combined with the default of 32 ports per VLAN means most people will soon run out of ports on their DV port groups.

Who knows why 32 is the default. It seems a bit conservative to me. Maybe there is a global port limit but I haven’t been able to confirm this.

Either way, 32 doesn’t seem nearly enough ports in most network designs. The good news is the maximum is 1024, so it makes sense to me to increase it substantially depending on the number of VLANs you have.

As soon as your vBlock lands I would definitely review each DV Port Group and increase the max ports assigned.

Static binding is a pain in the arse – it means that any VM whether a template or whether its powered off will use up a port if it is assigned to the DV Port Group. You may only have 5x running VMs on the VLAN but you won’t be able to add and power on a 6th VM if you have 27x VMs\templates powered off and assigned to that same DV Port Group.

For that reason alone I am not sure why VCE don’t just use ephemeral binding. Anyway I am going off topic.

Instructions from VMware KB1035819 on how to increase your max ports for each VLAN (port-profile).

These are the commands I use:

  1. show port-profile – to find the correct port profile name
  2. conf t – enter configuration
  3. port-profile <DV-port-group-name> – change configuration context to the correct port-profile
  4. vmware max-ports 64 – change max ports to 64
  5. copy run start – copy running config to startup config
  6. exit
  7. exit
  8. exit

 

vBlock Tip: vSphere 5, SIOC, EMC FAST VP and Storage DRS


If you’ve got a vBlock its most likely you’ve got an EMC array with EMC FAST VP and hopefully by now you’ve upgraded to vBlock Matrix 2.5.0 and you’re using vSphere 5.

If not, what are you waiting for? Oh yeah, there are still a few outstanding issues.  (My advice wait for the Storage vMotion issues to be resolved, its a real pain.)

I wanted to post some best practices and recommended settings for leveraging VMware’s Storage IO Control with EMC Fast VP and Storage DRS.

First a quick recap:

  • FAST VP is EMC’s sub LUN auto-tiering mechanism.
  • SIOC is VMware’s attempt to leverage the idea of DRS (distributed resource prioritisation) into the storage layer. SIOC  provides I/O performance monitoring and isolation of virtual machines in vSphere 5.
  • Storage DRS is a new feature in vSphere 5 which allows datastores to be pooled together as a single resource.

The bottom line: EMC FAST VP and SIOC are not only compatible but can work together harmoniously because they serve different purposes.

EMC FAST monitors data usage over an hourly period and only moves data once every 24 hours. Unlike SIOC, EMC FAST redistributes data based on the 1GB slice usage and lowers the response time of the busiest slices.

Compared to EMC FAST, SIOC uses a relatively short sampling window and is designed to quickly deal with short term IO contention crises. It can act quickly to throttle IO to limit guest latency during times of IO contention.

SIOC and EMC FAST perform complementary roles to monitor and improve storage performance, therefore they should both be leveraged in your environment.

And lastly Storage DRS – should it be used — yes, but  in what capacity?

My recommendation is to leverage Storage DRS in Automatic mode for initial placement to balance VMs evenly across datastores. I would also enable SDRS to monitor free capacity to make VM relocation recommendations if datastores approach capacity. The default setting is 90% which should be adequate.

What should be disabled though is IO Metrics — It is EMC’s recommendation Storage DRS IO metrics be disabled when using FAST VP. This is because they will perform competing roles, potentially identifying similar relocations and cause inefficient use of storage system resources.

So there you have it. The best way to leverage these components in your vBlock.

Sources:

There is a great EMC document here which lists best practice with EMC VNX Storage and vSphere and an old, but relevant article from Virtual Geek on SIOC and auto-tering.

vBlock Tip: Set VMFS3.MaxHeapSizeMB to 256MB on all ESXi hosts


This is the sort of issue I came across this week that you would expect VCE to make a de facto standard in the vBlock.

Why? Because the performance hit is negligible (a slight increase in additional kernel memory of 64mb), vBlock customers are likely to hit this ceiling and its another setting that we then don’t have to worry about.

I started running into issues vMotioning two VMs. It turns out this is a known issue as per KB1004424.

I was told by VMware: ‘ESXi 5.0 Host(Source on which you are trying to power on the VM)  already has 18 virtual disks(.vmdk) greater than 256GB in size open and you are trying to power on a virtual machine with another virtual disk of greater than 256GB in size.’

The heap size is effectively the amount of VMDK storage that can be hosted, across all virtual machines, for a given host.

The default heap size is a value of 80. To calculate the amount of open VMDK storage on the host that is available,  multiply 80 x 256 * 1024 — A 80MB heap value extends to 20 TB of open VMDK storage on a host.

Increasing the size to 256 results in 256 x 256 * 1024 — A 256MB heap value extends to 64 TB of open VMDK storage on a host, which should be plenty.

I was provided the following instructions which I repeated on each host to fix:

  1. Login to the vCenter Server or the ESXi host using the vSphere Client.
  2. Click on the configuration tab of the ESXi Host
  3. Click on Advanced Settings under Software
  4. Select VMFS3
  5. Change the value of VMFS3.MaxHeapSizeMB to 256
  6. Click on OK
  7. Reboot host

After rebooting each host the problem was solved. That was an easy one, for once!

SIOC is fully supported by SRM 5…er…except for planned migrations!


I was having some issues with SIOC and SRM 5 Planned Migrations. I noticed that my planned migration were failing when SIOC was enabled.

This got me thinking to whether SIOC is even supported with SRM 5, but I couldn’t even find documentation online whether it was supported. Looks like any mention of it has been omitted from the official documentation.

So after a bit of digging here is what I’ve found from VMware:

1) SIOC is supported for use with SRM – you can use SRM to protect SIOC enabled datastores

2) to execute a “planned migration” with SRM – you will need to disable SIOC first (on the datastores). You can not do a “planned migration” with SIOC enabled.

Lets start with the good news — SIOC is supported by SRM 5, so you can leave it enabled on all your replicated datastores.

This leads us to Point 2 – There are a few caveats:

As per KB2004605, you cannot unmount a datastore with SIOC enabled. If you are going to initiate a Planned Migration, you need to disable SIOC first on your protected site (active) LUNS. This is because SRM needs to unmount the active LUNS before it breaks the mirror and sets the read-only LUNS in your Recovery Site to read-write and mounts them on all ESXi hosts.

If you attempt a Planned Migration without disabling SIOC, the unmounting of LUNS and therefore the Planned Migration will fail.

There are other instances where a mounted datastore would need to be unmounted. Consider the following scenario, I haven’t had a chance to test this, but this is what I think will happen:

  1. For whatever reason your protected site (DC1) goes offline.
  2. Login to SRM at your Recovery Site (DC2) and initiate your Disaster Recovery plan
  3. The Protected (DC1) site array is unavailable so SRM is unable to synchronise changes, but it continues the recovery –
  4. SRM Instructs RecoverPoint\SRDF to break the mirror and convert the read-only recovery site (DC2) LUNS to read-write and SRM mounts them in vCenter.
  5. SRM powers on your VMs. Job done!
  6. But wait, the old protected site (DC1) eventually comes back online.
  7. You login back in SRM and hit Reprotect to start replicating back the other way
  8. SRM tries to unmount the LUNS in vCenter in DC1 before it begins replication back the other way but cannot because SIOC is enabled.
  9. The reprotection fails.

It seems clumsy to me that SRM isn’t aware of SIOC – It doesn’t matter whether it’s during a planned migration or a re-protect, if you have to keep disabling and re-enabling it’s a pain in the arse.

Clearly this isn’t going to happen a lot once you go live and its an annoyance at best, but this is the sort of minor issue that a polished product like SRM 5 shouldn’t have. Maybe I’m being so critical because it is such a good product now – they’ve raised my expectations!

I’ve raised a feature request with VMware to have this automated in a next release and I’ve been told the documentation will be updated to ‘state the obvious’.

Maybe I am blissfully ignorant of the complexity involved but as an enterprise end user it looks like a gap to me that needs fixing.

Manual steps introduce uncertainty and risk and this looks like an issue that should be solved.

vSphere 5, vShield 5, Trend DS 8 (vBlock 300HX) Upgrade


Call this the perfect storm upgrade. If you have to perform a vSphere 5, vShield 5 and Trend DS 8 upgrade (whether or not you happen to have a vBlock 300HX), read the following for what TO do and what NOT to do!

The main caveats to remember when performing this upgrade are:

  • vShield Endpoint v3.x and vShield Endpoint v5.x are NOT compatible.
  • You cannot upgrade to the latest VMware Tools if you have the old endpoint thin agent installed on your Windows VMs. It has to be removed first.

Your final approach will depend on whether you are upgrading your hosts with VUM or rebuilding them withvia ISO. I took the ISO route as I thought it would be cleaner.

Before we get started, there is some documentation you should read:

  1. vSphere 5 Upgrade Guide including vCenter, ESXi
  2.  vShield 5 Quick Start guide
  3. Trend Manager 8 Getting Started Guide

Step-by-Step Deployment Guide:

I’ll tell you what you should do to avoid the pain and suffering I went through. If you prefer testing the upgrade on a single host to ensure the process works, update accordingly. It will still work.

  1. Upgrade Trend Manager to v8
  2. Power of all your VMs except Trend appliances.
  3. De-activate your Trend Appliances from Trend Manager
    • You should see the Trend service account in Virtual Center updating the configuration (.vmx) files of all your VMs.
    • Confirm all VFILE line entries have been removed from the VMs .vmx files before continuing
  4. Power off and delete your Trend appliances from Virtual Center
  5. Put all hosts into Maintenance mode.
  6. Remove Virtual Center from Trend Manager.
  7. Login and un-register vShield Manager 4.1 from Virtual Center
    • Power off vShield Manager 4.1
  8. Disconnect and remove all hosts from cluster
  9. Upgrade Virtual Center to v5
    • If any your hosts are disconnected during the upgrade, just reconnect them.
  10. Upgrade VMware Update Manager to v5
  11. Deploy vShield Manager v5
  12. Register vShield Manager v5 with Virtual Center
  13. Rebuild hosts manually with vanilla ISO
    • Setup management IP address on each host
  14. Add hosts back into the cluster
  15. Patch hosts with VUM and apply any host profiles
  16. Add hosts back to the 1000V if present
    • Setup all vDS virtual adapters
  17. Add virtual center back into the Trend Manager
  18. Deploy vShield Endpoint v5 driver to all hosts
    • Ensure vShield Manager is reporting Endpoint is installed before continuing
  19. Deploy Trend 8 dvfilter-dsa to all hosts via Trend Manager
    • Ensure Trend Manager is reporting hosts are prepared before continuing
  20. Deploy and activate all Trend 8 virtual appliances
    • Ensure all virtual appliances are reporting as ‘vShield Endpoint: Registered’
  21. Power on your VMs
  22. Remove vShield Endpoint Thin Agent from all your Windows VMs and reboot
  23. Upgrade VMware Tools on all your VMs, ensuring vShield option is selected. Reboot required.
  24. Confirm all VMs are protected by the local virtual appliance. Anti-malware should report ‘real time’.
  25. Update all your DRS groups as all the hosts and appliances will have been removed.
If you want to upgrade, rather than rebuild, do the following between steps 3 and 4:
  1. Uninstall Trend filter (dvfilter-dsa) from all hosts
  2. Uninstall Endpoint v3 filter (epsec_vfile) from all hosts
and upgrade vShield Manager instead of deploying new version. Refer to Page 29 of the vShield Quick Start Guide.
Things to Watch Out For:
Steps 2 and 3 are crucial.
Step 2 – vShield Endpoint v3 includes a loadable kernel module (LKM) called VFILE, which loads into the kernel on a vSphere 4.1 host at boot up.  Whenever a VM is powered on, on a host running the VFILE LKM, the virtual machine’s .vmx file is updated with the following two line entries:

VFILE.globaloptions = “svmip=169.254.50.39 svmport=8888?
scsi0:0.filters = “VFILE”

vShield endpoint v5 does not do this! No VFILE LKM is loaded, no VFILE line entries are added to the .vmx files of the VMs. Therefore if you do not correctly decommission vShield Endpoint v3, your VMs will not power on, on your vSphere 5 hosts.

This is implied in the vShield 5 Quick Start guide on Page 31 under ‘Upgrading vShield Endpoint’:

2. Deactivate all Trend DSVAs. This is required to remove vShield related VFILE filter entries from the virtual machines.

What they don’t tell you above though is that all your VMs must be powered off. If you de-activate your Trend appliances while your VMs are on, well mine just had their .vmx files updated again immediately afterwards!

If you missed that step the first time around, you’ll have to manully update the .vmx file of every virtual machine to remove the vfile line entries as per KB1030463.

 Step 3 – If you don’t remove and re-add Virtual Center from Trend Manager after you have installed vShield Manager 5,  your DS virtual appliances will not register with vShield Endpoint.

Step 7 – First time I deployed vShield Manager 5 I didn’t have any issues, although I did have to re-deploy it a 2nd time as it stopped synchronising with vCenter. Unfortunately then it no longer recognised vShield Endpoint was installed and I had to rebuild all my hosts.

Besides these issues, things went relatively smoothly. Its just a matter of time.

Good Luck!