VCE vBlock – 1st Year in Review


Well we have just passed a year of vBlock ownership and the last year has passed rather painlessly.

Our vBlock was one of the first out there, delivered in November 2011. I wanted to provide some pros and cons of vBlock ownership. Some of the themes are not vBlock specific, but worth bearing in mind because there will always be a gap between what you hear from pre-sales and what the reality is.

Pros:

VCE – The company has been constantly improving which is good to see. Not content to rest on their laurels, they really have grabbed the bull by the horns and they are innovating in a lot of areas.

vBlock – The concept of the vBlock itself deserves a mention. VCE are definitely on the right path… it’s like the first generation Model T Ford. I’m sure old Henry had hundred’s of suppliers that provided the components for his Model T and he came along with the assembly line production and he put it all together. This is like what is happening over at VCE. Over time I’m hoping that the integration between components will become more and more seamless as the demand for pre-configured virtualisation platforms grows and grows and the designers behind each of the components are forced to work closer together.

Management and Support – If you have a bloated IT support team in large sprawling organisation, a vBlock can help reduce your head count by simplifying your environment. One thing converged infrastructure platforms are good for, is breaking down the traditional support silos with regards to storage, network, compute, virtualisation. When all the components are so tightly integrated, your silo’d operations team morphs into one.

Compatibility Matrix – This has to be the biggest selling point in my book. Taking away the pain of ensuring compatibility between so many different components. The VCE matrix is far more stringent than individual vendor product testing and therefore far more trust worthy. Try getting a complete infrastructure upgrade over a single weekend across storage, network, compute and virtualisation components through your change management team. It’s not going to happen unless it’s been pre-tested.

Single line of support – Being able to call a single number when there is any issue, immensely simplifies fault finding and problem resolution. Worth it alone just for this and the matrix.

Single pain of glass – This is where UIMp is starting to come into its own. It’s been a long road, but the future looks good. VCE’s goal is to replace each of the individual management consoles so that VCE customers can use UIMp for all their automated provisioning. When it works, it really does simplify provisioning.

Customer Advocate – In my experience the customer advocate offers great value. Extremely useful when managing high severity incidents and ensuring your environment remains up to date and in support, with regular services reviews and providing an easy path into VCE to organise training sessions, bodies to fill gaps in support, provide direct line of contact to escalation engineers and just deal with any queries and questions you may have about your environment.

Cons:

The AMP – the major design flaw in the AMP for me is the 1GB network. Data transfers between VMs in our 10GB service cluster can achieve 300 Mbps; as soon as the AMP is involved it drops to 30Mbps. Really annoying and what is in the AMP? vCenter, which is used to import virtual machines. Let’s say you are doing a migration of 1000 VMs for example… that 30Mbps is going to get really annoying and it has.

Cost – The vBlock hardware isn’t so bad, but what really surprised me is the amount of and cost of the licenses. Want to add a UCS Blade? No problem, that will be £5k for the blade and about £3k for the licenses – UCS, UIMp, VNX, vSphere,  etc. It all adds up pretty quickly. Ensuring you adequately size your UCS blades up front, i.e. plenty of memory and CPU is really important.

Management & Support – Converged Infrastructure Platforms require a lot of ongoing support and management. This is an issue not limited to VCE. It’s just the nature of the beast. If you have  an immature IT organisation and have had a fairly piecemeal IT infrastructure and support team up until now, you will be in for a shock when you purchase a converged infrastructure platform. There’s no doubt a vBlock is an excellent product, but it’s excellent because it uses the latest and greatest, which can be complex. It also comprises multiple products  from 3 different vendors – EMC, Cisco and VMware, so you need the right skillset to manage it, which can be expensive to find and train. It takes at least a year for someone to become familiar with all components of the vBlock. You’re always going to have employees with core skills like virtualisation, storage, network, compute, etc, but you do want people to broaden their skills and be comfortable with the entire stack.

Integration between products – See above, multiple products from 3 different vendors. At the moment the VCE wrapper is just that, little more than a well designed wrapper, lots of testing and a single line of support. Ok, so EMC own VMware, but it seems to make little difference. EMC can’t even align products within their own company, how on earth can they expect to align products with a subsidiary?  If the vBlock is going to be a single vendor product, then all 3x vendors need to invest in closer co-operation to align product lifecycles and integration. VMware release vCenter 5.1 and Powerpath have to release an emergency patch to support it? Going back to my Model T analogy, the vBlock is never going to become a real Model T until Cisco buys EMC or EMC drop Cisco and start making the compute\network components. Not so far fetched.

Complexity – The VCE wrapper hasn’t changed the complexity. (This is the same with HP or Flexpod.) This is another myth. “We’ve made it simple!”. Er, no, you haven’t. You’ve just done all the design work and testing for us. Until the integration above takes places, which will allow for simplification of the overall package its going to remain just a wrapper and it’s still going to remain an extremely complex piece of kit. VCE have focused efforts on improving UIMp to simplify vBlock provisioning and to simplify vBlock management through a single interface but really these are just band aids if the individual components are made by separate companies.

Patching – Even though there is a compatibility matrix, which does the integration and regression testing for you, it still doesn’t take away the pain\effort of actually deploying the patches. Having a vBlock doesn’t mean there is no patching required. This is a common pre-sales myth, ‘Don’t worry about it, we’ll do all the patching for you.’ Sure, but at what cost? Security patches, bug fixes and feature enhancements come out more or less monthly and this has to be factored in to your budget and over time costs.

Monitoring and Reporting – This is a pain and I know there are plans afoot at VCE to simplify this, but currently there is no single management point you can query to monitor the vitals of a vBlock. If you want to know the status of UCS: UCS manager, VNX: Unisphere, ESXi: vCenter, etc. For example, you buy VCOps but that only plugs into vCenter, so you are only aware of what resources vCenter has been assigned. To get a helicopter view of the entire vBlock from a single console is impossible. UIMp gives you a bit of a storage overview: available vs provisioned, but does not give you much more than that. So you end up buying these tactical solutions for each of the individual components, like VNX Monitoring and Reporting. Hopefully soon we will be able to query a single device and get up to date health checks and alerting for all vBlock components.

Niggles – There have been a few small niggles, mainly issues between vCenter/Cisco 1000V and vCenter/VNX 7500 but overall for the amount of kit we purchased it has not been bad. I think a lot of these issues had to do with vCenter 5\ESXi 5. As soon as Upgrade 1 came out, everything settled down. Note to self don’t be quick up upgrade to vCenter 6/ESXi 6!

Advertisements

VCE release latest Compatibility Matrix with vSphere 5.1


With VCE announcing back in August that they were fully supporting VMware’s vCloud Suite 5.1, I guess its no surprise that the latest VCE matrix with vSphere 5.1 has just been released last week…

But the timing is a little surprising. I was not expecting VCE to release their compatibility matrix with vSphere 5.1 so soon after VMware’s GA release.

Normally they take a good couple of months after a major release to test product integration with all the vBlock components… Saying that it does look like the number of changes are relatively minor for a major release. Normally VCE would take the opportunity to refresh all the components across the vBlock, especially field notices for the Nexus and VNX, but it looks like the emphasis on this release has been to get vSphere 5.1 out ASAP. I’m guessing promises were made somewhere!

Saying that I would not recommend deploying this if you are a VCE customer until the vSphere 5.1 Update 1 is released, especially if you are in a large organisation where major upgrades can only be undertaken once or twice a year. New releases are always a little buggy and I doubt this will be any different… We had enough issues when vSphere 5.0 was released… the sort of minor problems I would rather not go through again. Not because of the business impact, just because of the time spent on the phone diagnosing and identifying faults.

Additionally major upgrades are disruptive and it makes sense to ensure the matrix you upgrade to won’t need to be upgraded for a good 6 months or so, especially since you know the next matrix release is going to contain all the updates they couldn’t squeeze into this one.

EMC IONIX UIMp 3.1.1.2 Review


EMC Ionix Unified Infrastructure Manager/Provisioning or better known as UIMp is the vBlock provisioning tool.

I must say I have been a big UIMp sceptic. When I got my hands on a vBlock in December 2011 UIMp was around version 2 and it was crap!

UIMp was only fit for purpose during the vBlock deployment in Cork. It could provision multiple service offerings (ESXi Clusters) automatically, performing a number of manual tasks across UCS, VNX, Nexus and vSphere services, allowing VCE to meet their 30 day bare-metal to customer install lead times, but once the service offerings were provisioned that was pretty much it.

The only practical feature available to customers was to add datastores to your service offering. Woaw! Slow down tiger! And VCE had the cheek to charge you a fortune for the licenses… It was alot easier just to turn UIMp off and use the native management tools directly, which is what a lot of customers ended up doing.

Back in the day if you wanted to add a blade to an existing ESXi cluster… no problem, just decommission and recreate the service offering – that means blowing away the cluster, UCS profiles, storage LUNS, and ESXi hosts. No small feat and if you are a single company, you’re normally going to have one or two service offerings, say Production and Test&Dev. Not exactly usable.

Well things have improved dramatically since then. Flexible service offerings were introduced in v3.0, if I remember correctly, and they allowed customers to add blades to (expand) an existing service offering. It was a big improvement and a step in the right direction.

As more and more customers have bought vBlocks, the pressure on the IONIX team to deliver a robust, mature product has increased and they have risen to the challenge. UIMp keeps on getting better and better. Their stated aim is to negate the need to use the native management tools (i.e. UCS manager, MDS Fabric Manager, Unisphere) and automate the vBlock provisioning and management tasks…

No small feat and not easy to do without taking away some features found only in the native tools. So there has always been a big enough trade off to put me off UIMp…

But I must say having just installed UIMp 3.1.1.2, which is the latest version just released in the last few weeks with the newest vBlock Compatibility Matrix, I am slowly being converted.

One of the reasons I am slowly being converted is that while UIMp was out of action I tried to manually provision some blades and I could not get the zoning and masking configured correctly… I ended up putting it off until I had completed this install, which made me appreciate how simple UIMp makes even the most difficult provisioning tasks.

The GUI is very slick now, so much more responsive. It was painless to install and configure to.An hour’s webex was all it took and I had a new service offering configured. (That’s also due to VCE’s excellent support – reason enough to go ahead if you are thinking of getting a vBlock.)

As I deployed a fresh install, I ran the service adoption utility (more to come in another post), which is extremely slick and had our existing vBlock service offerings imported in a few minutes.

What’s missing? There are a couple of native features that are on the todo list I believe. I would really like to be able to choose the LUN ID when deploying datastores. It is extremely useful if you are replicating datastores between two different arrays, with for example, EMC RecoverPoint, to have the same LUN ID in both datacentres.

Other than that, if you have a vBlock and are thinking of upgrading, I highly recommend it.

My grade:

vBlock Tip: vSphere 5, SIOC, EMC FAST VP and Storage DRS


If you’ve got a vBlock its most likely you’ve got an EMC array with EMC FAST VP and hopefully by now you’ve upgraded to vBlock Matrix 2.5.0 and you’re using vSphere 5.

If not, what are you waiting for? Oh yeah, there are still a few outstanding issues.  (My advice wait for the Storage vMotion issues to be resolved, its a real pain.)

I wanted to post some best practices and recommended settings for leveraging VMware’s Storage IO Control with EMC Fast VP and Storage DRS.

First a quick recap:

  • FAST VP is EMC’s sub LUN auto-tiering mechanism.
  • SIOC is VMware’s attempt to leverage the idea of DRS (distributed resource prioritisation) into the storage layer. SIOC  provides I/O performance monitoring and isolation of virtual machines in vSphere 5.
  • Storage DRS is a new feature in vSphere 5 which allows datastores to be pooled together as a single resource.

The bottom line: EMC FAST VP and SIOC are not only compatible but can work together harmoniously because they serve different purposes.

EMC FAST monitors data usage over an hourly period and only moves data once every 24 hours. Unlike SIOC, EMC FAST redistributes data based on the 1GB slice usage and lowers the response time of the busiest slices.

Compared to EMC FAST, SIOC uses a relatively short sampling window and is designed to quickly deal with short term IO contention crises. It can act quickly to throttle IO to limit guest latency during times of IO contention.

SIOC and EMC FAST perform complementary roles to monitor and improve storage performance, therefore they should both be leveraged in your environment.

And lastly Storage DRS – should it be used — yes, but  in what capacity?

My recommendation is to leverage Storage DRS in Automatic mode for initial placement to balance VMs evenly across datastores. I would also enable SDRS to monitor free capacity to make VM relocation recommendations if datastores approach capacity. The default setting is 90% which should be adequate.

What should be disabled though is IO Metrics — It is EMC’s recommendation Storage DRS IO metrics be disabled when using FAST VP. This is because they will perform competing roles, potentially identifying similar relocations and cause inefficient use of storage system resources.

So there you have it. The best way to leverage these components in your vBlock.

Sources:

There is a great EMC document here which lists best practice with EMC VNX Storage and vSphere and an old, but relevant article from Virtual Geek on SIOC and auto-tering.

vBlock Tip: Set VMFS3.MaxHeapSizeMB to 256MB on all ESXi hosts


This is the sort of issue I came across this week that you would expect VCE to make a de facto standard in the vBlock.

Why? Because the performance hit is negligible (a slight increase in additional kernel memory of 64mb), vBlock customers are likely to hit this ceiling and its another setting that we then don’t have to worry about.

I started running into issues vMotioning two VMs. It turns out this is a known issue as per KB1004424.

I was told by VMware: ‘ESXi 5.0 Host(Source on which you are trying to power on the VM)  already has 18 virtual disks(.vmdk) greater than 256GB in size open and you are trying to power on a virtual machine with another virtual disk of greater than 256GB in size.’

The heap size is effectively the amount of VMDK storage that can be hosted, across all virtual machines, for a given host.

The default heap size is a value of 80. To calculate the amount of open VMDK storage on the host that is available,  multiply 80 x 256 * 1024 — A 80MB heap value extends to 20 TB of open VMDK storage on a host.

Increasing the size to 256 results in 256 x 256 * 1024 — A 256MB heap value extends to 64 TB of open VMDK storage on a host, which should be plenty.

I was provided the following instructions which I repeated on each host to fix:

  1. Login to the vCenter Server or the ESXi host using the vSphere Client.
  2. Click on the configuration tab of the ESXi Host
  3. Click on Advanced Settings under Software
  4. Select VMFS3
  5. Change the value of VMFS3.MaxHeapSizeMB to 256
  6. Click on OK
  7. Reboot host

After rebooting each host the problem was solved. That was an easy one, for once!

SIOC is fully supported by SRM 5…er…except for planned migrations!


I was having some issues with SIOC and SRM 5 Planned Migrations. I noticed that my planned migration were failing when SIOC was enabled.

This got me thinking to whether SIOC is even supported with SRM 5, but I couldn’t even find documentation online whether it was supported. Looks like any mention of it has been omitted from the official documentation.

So after a bit of digging here is what I’ve found from VMware:

1) SIOC is supported for use with SRM – you can use SRM to protect SIOC enabled datastores

2) to execute a “planned migration” with SRM – you will need to disable SIOC first (on the datastores). You can not do a “planned migration” with SIOC enabled.

Lets start with the good news — SIOC is supported by SRM 5, so you can leave it enabled on all your replicated datastores.

This leads us to Point 2 – There are a few caveats:

As per KB2004605, you cannot unmount a datastore with SIOC enabled. If you are going to initiate a Planned Migration, you need to disable SIOC first on your protected site (active) LUNS. This is because SRM needs to unmount the active LUNS before it breaks the mirror and sets the read-only LUNS in your Recovery Site to read-write and mounts them on all ESXi hosts.

If you attempt a Planned Migration without disabling SIOC, the unmounting of LUNS and therefore the Planned Migration will fail.

There are other instances where a mounted datastore would need to be unmounted. Consider the following scenario, I haven’t had a chance to test this, but this is what I think will happen:

  1. For whatever reason your protected site (DC1) goes offline.
  2. Login to SRM at your Recovery Site (DC2) and initiate your Disaster Recovery plan
  3. The Protected (DC1) site array is unavailable so SRM is unable to synchronise changes, but it continues the recovery –
  4. SRM Instructs RecoverPoint\SRDF to break the mirror and convert the read-only recovery site (DC2) LUNS to read-write and SRM mounts them in vCenter.
  5. SRM powers on your VMs. Job done!
  6. But wait, the old protected site (DC1) eventually comes back online.
  7. You login back in SRM and hit Reprotect to start replicating back the other way
  8. SRM tries to unmount the LUNS in vCenter in DC1 before it begins replication back the other way but cannot because SIOC is enabled.
  9. The reprotection fails.

It seems clumsy to me that SRM isn’t aware of SIOC – It doesn’t matter whether it’s during a planned migration or a re-protect, if you have to keep disabling and re-enabling it’s a pain in the arse.

Clearly this isn’t going to happen a lot once you go live and its an annoyance at best, but this is the sort of minor issue that a polished product like SRM 5 shouldn’t have. Maybe I’m being so critical because it is such a good product now – they’ve raised my expectations!

I’ve raised a feature request with VMware to have this automated in a next release and I’ve been told the documentation will be updated to ‘state the obvious’.

Maybe I am blissfully ignorant of the complexity involved but as an enterprise end user it looks like a gap to me that needs fixing.

Manual steps introduce uncertainty and risk and this looks like an issue that should be solved.

vSphere 5, vShield 5, Trend DS 8 (vBlock 300HX) Upgrade


Call this the perfect storm upgrade. If you have to perform a vSphere 5, vShield 5 and Trend DS 8 upgrade (whether or not you happen to have a vBlock 300HX), read the following for what TO do and what NOT to do!

The main caveats to remember when performing this upgrade are:

  • vShield Endpoint v3.x and vShield Endpoint v5.x are NOT compatible.
  • You cannot upgrade to the latest VMware Tools if you have the old endpoint thin agent installed on your Windows VMs. It has to be removed first.

Your final approach will depend on whether you are upgrading your hosts with VUM or rebuilding them withvia ISO. I took the ISO route as I thought it would be cleaner.

Before we get started, there is some documentation you should read:

  1. vSphere 5 Upgrade Guide including vCenter, ESXi
  2.  vShield 5 Quick Start guide
  3. Trend Manager 8 Getting Started Guide

Step-by-Step Deployment Guide:

I’ll tell you what you should do to avoid the pain and suffering I went through. If you prefer testing the upgrade on a single host to ensure the process works, update accordingly. It will still work.

  1. Upgrade Trend Manager to v8
  2. Power of all your VMs except Trend appliances.
  3. De-activate your Trend Appliances from Trend Manager
    • You should see the Trend service account in Virtual Center updating the configuration (.vmx) files of all your VMs.
    • Confirm all VFILE line entries have been removed from the VMs .vmx files before continuing
  4. Power off and delete your Trend appliances from Virtual Center
  5. Put all hosts into Maintenance mode.
  6. Remove Virtual Center from Trend Manager.
  7. Login and un-register vShield Manager 4.1 from Virtual Center
    • Power off vShield Manager 4.1
  8. Disconnect and remove all hosts from cluster
  9. Upgrade Virtual Center to v5
    • If any your hosts are disconnected during the upgrade, just reconnect them.
  10. Upgrade VMware Update Manager to v5
  11. Deploy vShield Manager v5
  12. Register vShield Manager v5 with Virtual Center
  13. Rebuild hosts manually with vanilla ISO
    • Setup management IP address on each host
  14. Add hosts back into the cluster
  15. Patch hosts with VUM and apply any host profiles
  16. Add hosts back to the 1000V if present
    • Setup all vDS virtual adapters
  17. Add virtual center back into the Trend Manager
  18. Deploy vShield Endpoint v5 driver to all hosts
    • Ensure vShield Manager is reporting Endpoint is installed before continuing
  19. Deploy Trend 8 dvfilter-dsa to all hosts via Trend Manager
    • Ensure Trend Manager is reporting hosts are prepared before continuing
  20. Deploy and activate all Trend 8 virtual appliances
    • Ensure all virtual appliances are reporting as ‘vShield Endpoint: Registered’
  21. Power on your VMs
  22. Remove vShield Endpoint Thin Agent from all your Windows VMs and reboot
  23. Upgrade VMware Tools on all your VMs, ensuring vShield option is selected. Reboot required.
  24. Confirm all VMs are protected by the local virtual appliance. Anti-malware should report ‘real time’.
  25. Update all your DRS groups as all the hosts and appliances will have been removed.
If you want to upgrade, rather than rebuild, do the following between steps 3 and 4:
  1. Uninstall Trend filter (dvfilter-dsa) from all hosts
  2. Uninstall Endpoint v3 filter (epsec_vfile) from all hosts
and upgrade vShield Manager instead of deploying new version. Refer to Page 29 of the vShield Quick Start Guide.
Things to Watch Out For:
Steps 2 and 3 are crucial.
Step 2 – vShield Endpoint v3 includes a loadable kernel module (LKM) called VFILE, which loads into the kernel on a vSphere 4.1 host at boot up.  Whenever a VM is powered on, on a host running the VFILE LKM, the virtual machine’s .vmx file is updated with the following two line entries:

VFILE.globaloptions = “svmip=169.254.50.39 svmport=8888?
scsi0:0.filters = “VFILE”

vShield endpoint v5 does not do this! No VFILE LKM is loaded, no VFILE line entries are added to the .vmx files of the VMs. Therefore if you do not correctly decommission vShield Endpoint v3, your VMs will not power on, on your vSphere 5 hosts.

This is implied in the vShield 5 Quick Start guide on Page 31 under ‘Upgrading vShield Endpoint’:

2. Deactivate all Trend DSVAs. This is required to remove vShield related VFILE filter entries from the virtual machines.

What they don’t tell you above though is that all your VMs must be powered off. If you de-activate your Trend appliances while your VMs are on, well mine just had their .vmx files updated again immediately afterwards!

If you missed that step the first time around, you’ll have to manully update the .vmx file of every virtual machine to remove the vfile line entries as per KB1030463.

 Step 3 – If you don’t remove and re-add Virtual Center from Trend Manager after you have installed vShield Manager 5,  your DS virtual appliances will not register with vShield Endpoint.

Step 7 – First time I deployed vShield Manager 5 I didn’t have any issues, although I did have to re-deploy it a 2nd time as it stopped synchronising with vCenter. Unfortunately then it no longer recognised vShield Endpoint was installed and I had to rebuild all my hosts.

Besides these issues, things went relatively smoothly. Its just a matter of time.

Good Luck!