Configure and capture ESXi core dumps on a shared LUN


I had an issue recently where I couldn’t get capture a valid core dump during ESXi PSODs.

As I was using virtual distributed switches I couldn’t configure a network dump collector.

This isn’t well documented, so I’m going run through the commands I used to first setup the shared LUN as a VM diagnostic partition and then how to extract a core dump from the shared LUN after a crash.

  1. Putty in to ESXi host
  2. Run – esxcli system coredump partition list – to list your existing diagnostic partitions.
  3. Navigate to /vmfs/devices/disks
  4. Identify the naa disk identifier that you are going to use for your shared LUN.
  5. Run – partedUtil getptbl “naa.6006016031d02c00a468c9f88a31e111” – to get the starting and ending sectors of the disk
  6. Next create the diagnostic partition – partedUtil setptbl “/vmfs/devices/disks/naa.xxx” gpt “1 <starting sector> <ending sector> 9D27538040AD11DBBF97000C2911D1B8 0”
  7. i.e. partedUtil setptbl “/vmfs/devices/disks/naa.naa.6006016031d02c00a468c9f88a31e111” gpt “1 2048 209705200 9D27538040AD11DBBF97000C2911D1B8 0”
  8. Run – partedUtil getptbl “naa.6006016031d02c00a468c9f88a31e111” – again to confirm the partition has been created.
  9. Run – esxcli system coredump partition list – again to list your existing diagnostic partitions. The new partition you have just created should be set to False.
  10. Run – esxcfg-dumppart –set “naa.xxx:1″ – don’t forget the :1 to set the 1st partition as Active.
  11. Run – esxcli system coredump partition list – again to list your existing diagnostic partitions. The new partition you have just created should be set to True and the old diagnostic partition should be set to False.

After you have captured your first successful crash:

  1. Reboot the ESXi host from the PSOD screen
  2. Putty into ESXi host
  3. Run this command to test whether a core dump was successfully generated – esxcfg-dumppart -T -D “/vmfs/devices/disks/naa.xxx:1″
  4. If the answer is ‘YES’ then run this command to copy the core dump to the scratch partition – esxcfg-dumppart -C -D “/vmfs/devices/disks/naa.xxx:1”
  5. You should see output similar to ‘Created file /scratch/core/vmkernel-zdump.1’
  6. Navigate to ‘cd /scratch/core/’ and do a ls. Your dump file should be there.
  7. You can also run ‘ esxcfg-dumppart -L vmkernel-zdump.1’ to generate the vmkernel log.

All done!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s