Thursday, 17 December 2015

How To Analyze PSOD

Written by Suhas Savkoor

Purple Screen of Death or commonly known as PSOD is something which we see most of the times when we run an ESXi host.

Usually when we experience PSOD, we reboot the host (which is a must) and then gather the logs and upload it to VMware support for analysis (where I spend a good amount of time going through it)

Why not take a look at the dumps by yourself?

Step 1:
I am going to simulate a PSOD on my ESXi host. You need to be logged into the host's SSH. The command is

And when you open a DCUI to the ESXi host, you can see the PSOD

Step 2:
Sometimes, we might miss out on the screenshot of PSOD. Well that's alright! If we have core-dump configured for the ESXi, we can extract the dump files to gather the crash logs.

Reboot the host, if it is in the PSOD screen. Once the host is back up, login to the SSH/Putty of the host and go to the core directory. The core directory is the location where your PSOD logging go to.

Then list out the files here:

Here you can see the vmkernel dump file, and the file is in the zdump format.

Step 3:
How do we extract it?

Well, we have a nice extract script that does all the job, " vmkdump_extract ". This command must be executed against the zdump.1 file, which looks something like this:

It creates four files:
a) vmkernel-log.1
b) vmkernel-core.1
c) visorFS.tar
d) vmkernel-pci

All we require for analysis is the vmkernel-log.1 file

Step 4:
Open the vmkernel-log.1 file using the below command:

Skip to the end of the file by pressing Shift+G. Now let's slowly go to the top by pressing PageUp.
You will come across a line that says @BlueScreen: <event>

In my case, the dumps were:

  • The first line @BlueScreen: Tells the crash exception like Exception 13/14, in my case it is CrashMe which is for a manual crash. 
  • The VMKuptime tells the Kernel up-time before the crash.
  • The logging after that is the information that we need to be looking for, the cause as to why the crash occurred. 
Now, here the crash dump varies for every crash. These issues can range from hardware errors / driver issues / issues with ESXi build and a lot more.

Each dump analysis would be different. But the basic is the same. 

So, you can try analyzing the dumps by yourself. However, if you are entitled to VMware support, I will do the job for you.



  1. What about VMs which are running on that ESXi ??

    Nice Article !!!

    1. If you have a HA enabled cluster, you can expect a restart of the VMs on the failed host to another host available in the cluster. If not, your VMs are simply down and won't come up until a reboot of the ESXi is performed.

  2. Many thanks Mate :-))

    Well Explained .. Cheers

  3. Is there any good article to interpreter these logs.

    1. For a complete PSOD analysis using an example, refer the KB below: