Friday, 25 December 2015

Understanding VMkernel.log for vMotion Operation

Written by Suhas Savkoor

Let's decode the vMotion logging in VMkernel.log.

Open a SSH (Putty) to the host where this virtual machine currently resides. Change the directory to:

Capture the live logging of VMkernel using the following command:

Perform vMotion of a virtual machine residing on this host to any other available host with a shared storage. You will see the below logging:

I will break down the logging with " // " for comments.

2015-12-25T16:39:25.565Z cpu4:2758489)Migrate: vm 2758492: 3284: Setting VMOTION info: Source ts = 1451061663105920, src ip = <> dest ip = <> Dest wid = 1830931 using SHARED swap

//The first line Migrate vm 2758492 does not tell which virtual machine is being migrated. It tells the world ID of the virtual machine that is going to be migrated. To find the world ID of the virtual machine, before migrating run the command # esxcli vm process list This command lists all the virtual machines world IDs that is residing on the host.

// The Setting vMotion info 1451061663105920 is the vMotion ID. This vMotion ID is required because when you "grep" for this ID in the hostd.log or vmware.log (residing in the virtual machine directory) gives you further information of vMotion. In vmware.log you can see the transitioning states for vMotion, with each state performing a set of steps.

// The source ip  where this virtual machine currently resides is and the destination to where the virtual machine is being migrated to is

// The dest wid 1830931 is the world ID for this virtual machine once the vMotion is completed.

2015-12-25T16:39:25.567Z cpu4:2758489)Tcpip_Vmk: 1288: Affinitizing to world 2772001, Success
2015-12-25T16:39:25.567Z cpu4:2758489)VMotion: 2734: 1451061663105920 S: Set ip address '' worldlet affinity to send World ID 2772001
2015-12-25T16:39:25.567Z cpu4:2758489)Hbr: 3340: Migration start received (worldID=2758492) (migrateType=1) (event=0) (isSource=1) (sharedConfig=1)

// Here the host is being prepared for migration by taking it's IP address into consideration.

//The migration start received logs the vMotion type. The World ID 2758492 is recorded. MigrateType=1 is host migration,

//The host where I am logged in currently via the SSH is the source host which shows the isSource=1 and sharedConfig=1

2015-12-25T16:39:25.567Z cpu5:2771999)CpuSched: 583: user latency of 2771999 vmotionStreamHelper0-2758492 0 changed by 2771999 vmotionStreamHelper0-2758492 -1
2015-12-25T16:39:25.568Z cpu4:2772001)MigrateNet: 1186: 1451061663105920 S: Successfully bound connection to vmknic ''

//Here the connection from source host vmkernel port-group is established.

2015-12-25T16:39:25.570Z cpu5:33435)MigrateNet: vm 33435: 2096: Accepted connection from <::ffff:>

// Here the destination source has accepted the connection for vMotion

2015-12-25T16:39:25.570Z cpu5:33435)MigrateNet: vm 33435: 2166: dataSocket 0x410958a8dc00 receive buffer size is -565184049
2015-12-25T16:39:25.570Z cpu4:2772001)MigrateNet: 1186: 1451061663105920 S: Successfully bound connection to vmknic ''
2015-12-25T16:39:25.571Z cpu4:2772001)VMotionUtil: 3396: 1451061663105920 S: Stream connection 1 added.
2015-12-25T16:39:25.571Z cpu4:2772001)MigrateNet: 1186: 1451061663105920 S: Successfully bound connection to vmknic ''
2015-12-25T16:39:25.572Z cpu4:2772001)VMotionUtil: 3396: 1451061663105920 S: Stream connection 2 added.

//Both the surce and destination have established the connection and the vMotion process takes place. The VMkernel.log does not record the details of vMotion. If you check the vmware.log for this virtual machine, you can see the states and progress of vMotion in detail.

2015-12-25T16:39:25.848Z cpu3:2758492)VMotion: 4531: 1451061663105920 S: Stopping pre-copy: only 0 pages left to send, which can be sent within the switchover time goal of 0.500 seconds (network bandwidth ~2.116 MB/s, 52403100% t2d)

//In short how vMotion works is:

  • A shadow VM is created on the destination host.
  • Copy each memory page from the source to the destination via the vMotion network. This is known as preCopy.
  • Perform another pass over the VM’s memory, copying any pages that changed during the last preCopy iteration
  • Continue the pre-copy iteration until no changed page remains
  • Stun the VM and resume the destination VM
//Basically, the memory state of the virtual machine is being transferred to the shadow virtual machine created on the destination machine. The memory is nothing but pages.  The pages are transferred to the shadow VM over the vMotion network. Larger the VM I/Os, longer the vMotion process. 

//Towards the end of vMotion the source VM must be destroyed and the operations should continue at the destination end. For this, the ESXi should determine, that the last few memory pages can be transferred over to the destination quickly. Which is the switch-over goal of 0.5 seconds

//So here when it says only 0 pages left to send, which can be sent within the switchover time goal of 0.500 seconds it means that there are no more active memory pages left to be transferred. So the host declares that the source VM can be destroyed and the vMotion can be completed and the destination VM can resume. All this can happen within the feasible switch time. 

2015-12-25T16:39:25.952Z cpu5:2772001)VMotionSend: 3643: 1451061663105920 S: Sent all modified pages to destination (no network bandwidth estimate)

//Here it tells that the for the vMotion ID the "S" Source has sent all the memory pages to the destination.

2015-12-25T16:39:26.900Z cpu0:2758489)Hbr: 3434: Migration end received (worldID=2758492) (migrateType=1) (event=1) (isSource=1) (sharedConfig=1)
2015-12-25T16:39:26.908Z cpu3:32820)Net: 3354: disconnected client from port 0x200000c
2015-12-25T16:39:26.967Z cpu3:34039)DLX: 3768: vol 'Recovery_LUN', lock at 116094976: [Req mode 1] Checking liveness:

//Here the migration has completed for the world ID, the migration type. And the virtual machine in my case is residing on Recovery_LUN is locked by the new host that is residing on with a new world ID that was assigned during the vMotion.

So you know what a successful vMotion looks like in the vmkernel.log
In depth vMotion can be found in the vmware.log, which can be self explanatory once you know what to look at and where to look at.