I had an issue recently where I couldn’t get capture a valid core dump during ESXi PSODs.
As I was using virtual distributed switches I couldn’t configure a network dump collector.
This isn’t well documented, so I’m going run through the commands I used to first setup the shared LUN as a VM diagnostic partition and then how to extract a core dump from the shared LUN after a crash.
- Putty in to ESXi host
- Run – esxcli system coredump partition list – to list your existing diagnostic partitions.
- Navigate to /vmfs/devices/disks
- Identify the naa disk identifier that you are going to use for your shared LUN.
- Run – partedUtil getptbl “naa.6006016031d02c00a468c9f88a31e111” – to get the starting and ending sectors of the disk
- Next create the diagnostic partition – partedUtil setptbl “/vmfs/devices/disks/naa.xxx” gpt “1 <starting sector> <ending sector> 9D27538040AD11DBBF97000C2911D1B8 0”
- i.e. partedUtil setptbl “/vmfs/devices/disks/naa.naa.6006016031d02c00a468c9f88a31e111” gpt “1 2048 209705200 9D27538040AD11DBBF97000C2911D1B8 0”
- Run – partedUtil getptbl “naa.6006016031d02c00a468c9f88a31e111” – again to confirm the partition has been created.
- Run – esxcli system coredump partition list – again to list your existing diagnostic partitions. The new partition you have just created should be set to False.
- Run – esxcfg-dumppart –set “naa.xxx:1″ – don’t forget the :1 to set the 1st partition as Active.
- Run – esxcli system coredump partition list – again to list your existing diagnostic partitions. The new partition you have just created should be set to True and the old diagnostic partition should be set to False.
After you have captured your first successful crash:
- Reboot the ESXi host from the PSOD screen
- Putty into ESXi host
- Run this command to test whether a core dump was successfully generated – esxcfg-dumppart -T -D “/vmfs/devices/disks/naa.xxx:1″
- If the answer is ‘YES’ then run this command to copy the core dump to the scratch partition – esxcfg-dumppart -C -D “/vmfs/devices/disks/naa.xxx:1”
- You should see output similar to ‘Created file /scratch/core/vmkernel-zdump.1’
- Navigate to ‘cd /scratch/core/’ and do a ls. Your dump file should be there.
- You can also run ‘ esxcfg-dumppart -L vmkernel-zdump.1’ to generate the vmkernel log.