Tuesday, 16 August 2016

VDP Backup Fails: PAX stream append at sector offset 128 length 128 sectors=65536 failed (32)

So I have ran into this issue a couple of times now and was able to finally drill down to a solution, and so thought of sharing this as there is nothing about this issue anywhere on the web.

In the few scenarios where I came across this issue, some were due to expanding the VMDK causes the backup to fail, backup for VM with a disk greater than 2 TB fails and sometimes without any reason it fails continuously.

This failure occurs prominently when:

1) VMDK is larger than 2 TB. 
2) VMDK is not an integral multiple of 1 MB. 
3) VMDK has changes in the last extent of the VMDK

Let's see what time means a little down the line.

To start off, when you open the backup job log to see for a failure, you will see the following: 

Location:
/usr/local/avamarclient/var

2016-07-25T20:50:48.091+04:00 avvcbimage Info <14700>: submitting pax container in-use block extents:
2016-07-25T20:50:48.221+04:00 avvcbimage Info <6688>: Process 9604 (/usr/local/avamarclient/bin/avtar) finished (code 176: fatal signal)
2016-07-25T20:50:48.221+04:00 avvcbimage Error <0000>: [IMG0010] PAX stream append ([Store_1] test/test.vmdk) at sector offset 128 length 128 sectors=65536 failed (32)
2016-07-25T20:50:48.221+04:00 avvcbimage Warning <6690>: CTL workorder "Test-backup-1469491200006" non-zero exit status 'code 176: fatal signal'
2016-07-25T20:50:48.221+04:00 avvcbimage Info <9772>: Starting graceful (staged) termination, PAX stream append failed (wrap-up stage)
2016-07-25T20:50:48.221+04:00 avvcbimage Info <40654>: isExitOK()=157 
2016-07-25T20:50:48.221+04:00 avvcbimage Info <16022>: Cancel detected(miscellaneous error), isExitOK(0).
2016-07-25T20:50:48.221+04:00 avvcbimage Info <9746>: bytes submitted was 65536
2016-07-25T20:50:48.221+04:00 avvcbimage Error <0000>: [IMG0010] pax_container::endfile(virtdisk-flat.vmdk,65536) returned problem:32
2016-07-25T20:50:48.221+04:00 avvcbimage Error <0000>: [IMG0010] pax_container::enddir returned problem:32
2016-07-25T20:50:48.222+04:00 avvcbimage Info <40654>: isExitOK()=157 
2016-07-25T20:50:48.222+04:00 avvcbimage Info <16022>: Cancel detected(miscellaneous error), isExitOK(0).
2016-07-25T20:50:48.222+04:00 avvcbimage Info <12032>: backup (Store_1] test/test.vmdk) terminated 3145728.00 MB
2016-07-25T20:50:48.222+04:00 avvcbimage Info <40654>: isExitOK()=157 
2016-07-25T20:50:48.222+04:00 avvcbimage Error <0000>: [IMG1003] Backup of [Store_1] test/test.vmdk failed
2016-07-25T20:50:48.222+04:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Close: Close disk.

The avtar.log for this backup job in the same directory has the following back trace.

2016-08-08T20:42:17.180+04:00 avtar FATAL <5889>: Fatal signal 11 in pid 28769
> 2016-08-08T20:42:17.186+04:00 [avtar]  FATAL ERROR: <0001> uapp::handlefatal: Fatal signal 11
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000b3abe1
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000b3b957
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000b3bc7b
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000b3bd6e
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000a86530
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00007f9388482850
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000ac7364
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006762ef
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000678363
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006c8db8
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006cc683
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006cd083
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000009a5b76
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000009ad0ed
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000612bce
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006c1492
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006c36b6
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000004cfcfb
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000004d6d4e
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000004d82e0
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000a88129
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000004ea627
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00007f93864b1c36
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 000000000048a029
> 2016-08-08T20:42:17.186+04:00 [avtar]  ERROR: <0001> uapp::handlefatal: aborting program pid=28769, sig=11
> 2016-08-08T20:42:17.186+04:00 avtar FATAL <5890>: handlefatal: Aborting program with code 176, pid=28769, sig=11

So I had one disk for this VM which was expanded beyond 2 TB and when converted into MB it was 3397386.24 MB (3.24 TB). In plain words, the format in MB is fractional and not a whole number and hence it is not an integral multiple of 1. 
When your VDP sees the disk as a fractional value and beyond 2 TB, the backup fails. This is a known issue in the avTar version running on VDP. 

To find your current avTar version you can run the following;

> This command from root of the VDP SSH
status.dpn

> These two commands from admin mode only of VDP SSH
gsan --version
mcserver.sh --version

A VDP 6.1 has an avTar version of 7.2 with a varying minor build. 

How do we resolve this? 

Step 1: For a quick backup to make sure everything is saved for this VM. Not recommended as a permanent solution. 

Disable CBT on the VM, so that incremental backups are disabled for this VM and all backups are always full backup. If you do not have a restore point for a long time and would badly need one, then you can implement this. Else, if you have a good known restore point, then this is not required as it unnecessarily consumes space. To disable CBT you can follow this VMware KB article here

Step 2: Recommended. Make sure a good backup is available before this.

Here, you will have to locate the sizes of all your disks for that VM and convert them to MB. If the converted value in MB is a whole number, then that disk is good to go. 

If not, like in my case, where the drive was 3397386.24 MB, then we will have to round off this drive to the next largest whole number. Here it would be 3397387 MB. You can do this from the edit settings of the virtual machine. There is no need of expanding the drive from within the guest. This is done on the VMDK level only so that the VDP can look at this drive as an integral multiple of 1. 

Post this, you can have incremental backups running. 

This is acknowledged as a known issue and the fix is in the future release of VDP. Release version and ETA are unknown currently.

If you have questions before implementation, let me know.


Thanks!