vCloud Director 5.1.1, vCloud Networking and Security 5.1.1, ESXi 5.1.0a, vCenter 5.1.0a Released!


Looks like my proof of concept environment is out of date already…

VMware released a couple of updates on Thursday 25/10/2012:

  • VMware ESXi 5.1.0a Build 838463 –  Download

Looks like there are a couple of new features in vCloud Director like Elastic vDCs which will be worth looking into, but otherwise its all bug fixes.

I haven’t been having any issues per se, so not sure how much value I will get out of these updates, but will hopefully get these installed next week to ensure I am up to date with the latest patches and try play around with the new vCloud Director features.

 

Advertisements

Configure and capture ESXi core dumps on a shared LUN


I had an issue recently where I couldn’t get capture a valid core dump during ESXi PSODs.

As I was using virtual distributed switches I couldn’t configure a network dump collector.

This isn’t well documented, so I’m going run through the commands I used to first setup the shared LUN as a VM diagnostic partition and then how to extract a core dump from the shared LUN after a crash.

  1. Putty in to ESXi host
  2. Run – esxcli system coredump partition list – to list your existing diagnostic partitions.
  3. Navigate to /vmfs/devices/disks
  4. Identify the naa disk identifier that you are going to use for your shared LUN.
  5. Run – partedUtil getptbl “naa.6006016031d02c00a468c9f88a31e111” – to get the starting and ending sectors of the disk
  6. Next create the diagnostic partition – partedUtil setptbl “/vmfs/devices/disks/naa.xxx” gpt “1 <starting sector> <ending sector> 9D27538040AD11DBBF97000C2911D1B8 0”
  7. i.e. partedUtil setptbl “/vmfs/devices/disks/naa.naa.6006016031d02c00a468c9f88a31e111” gpt “1 2048 209705200 9D27538040AD11DBBF97000C2911D1B8 0”
  8. Run – partedUtil getptbl “naa.6006016031d02c00a468c9f88a31e111” – again to confirm the partition has been created.
  9. Run – esxcli system coredump partition list – again to list your existing diagnostic partitions. The new partition you have just created should be set to False.
  10. Run – esxcfg-dumppart –set “naa.xxx:1″ – don’t forget the :1 to set the 1st partition as Active.
  11. Run – esxcli system coredump partition list – again to list your existing diagnostic partitions. The new partition you have just created should be set to True and the old diagnostic partition should be set to False.

After you have captured your first successful crash:

  1. Reboot the ESXi host from the PSOD screen
  2. Putty into ESXi host
  3. Run this command to test whether a core dump was successfully generated – esxcfg-dumppart -T -D “/vmfs/devices/disks/naa.xxx:1″
  4. If the answer is ‘YES’ then run this command to copy the core dump to the scratch partition – esxcfg-dumppart -C -D “/vmfs/devices/disks/naa.xxx:1”
  5. You should see output similar to ‘Created file /scratch/core/vmkernel-zdump.1’
  6. Navigate to ‘cd /scratch/core/’ and do a ls. Your dump file should be there.
  7. You can also run ‘ esxcfg-dumppart -L vmkernel-zdump.1’ to generate the vmkernel log.

All done!

Known Issue! PSOD with Security 8 and ESXi 5.0


It looks like there is a known issue with Trend Deep Security 8 and ESXi 5.0 that causes PSOD. I cannot find a KB article yet, so am documenting it here. Hopefully this will help some people who don’t have a Trend support contract.

There is a known issue with ESXi 5.0 and Deep Security 8.0. A number of customer’s are experiencing ESXi system crashes – purple screen of death. By default the Deep Security Filter Driver will attempt to multiplex a single kernel timer across all virtual machines, to ensure they perform a maintenance task every 30 seconds.

This appears to be creating the instability issues and causing the system crashes as using a single timer across all VMs is complex to manage and implement.

The workaround is to disable this setting, so that the maintenance tasks execute without the timer. This occurs periodically anyway when the system processes packets, so there is no impact performing this change.

  1. SSH to ESXi. From the ESXi console, execute this command to find out the value that is configured for the Filter Driver heap memory size: Run % esxcfg-module -g dvfilter-dsa to see if you have modified the DSAFILTER_HEAP_MAX_SIZE
  2. If you have not configured the DSAFILTER_HEAP_MAX_SIZE value just set the DSAFILTER_MOD_TIMER_ENABLED to 0 with the following command: % esxcfg-module -s DSAFILTER_MOD_TIMER_ENABLED=0 dvfilter-dsa
  3. If you have configured the DSAFILTER_HEAP_MAX_SIZE value, use the following command to preserve your existing setting: % esxcfg-module -s “DSAFILTER_HEAP_MAX_SIZE= <value that you got from the last query> DSAFILTER_MOD_TIMER_ENABLED=0” dvfilter-dsa
  4. You should now see options = value set to DSAFILTER_MOD_TIMER_ENABLED=0 when you run % esxcfg-module -g dvfilter-dsa
  5. Reboot the ESXi server for the changes to take effect. Note: The setting will not take effect until the driver is reloaded. Reloading will require a reboot (best option) of ESXi or unloading/loading of the driver.