Friday, 30 December 2016

Understanding Partial Backups in vSphere Data Protection

If a backup has failed mid way or cancelled manually, then that backup instance is labelled as partial. Here we will see how to identify a partial backup and how to remove it. Note that partial backups do not cause issues to the working of a server. In some cases it might, let's consider few of the below scenarios. 

Scenario 1:
Let's say our VDP is a 512 GB deployment. This means the GSAN capacity is 512 GB. If you run mccli server show-prop, this will show the GSAN capacity.
Now, we have a VM of 400 GB which was backed up, but failed or cancelled mid way around 300 GB. In this case the restore point for the VM will not be seen under the Restore tab. Only when a backup has completed successfully or completed with exceptions it would be seen under the Restore tab. 
At this point of time, if you run the mccli server show-prop command now, it will show the server utilization at around 60-70 percent as the partial backup is using 300GB. 
Why this is not an issue? Let's say you fixed the problem that was causing the backup to fail. The next time the backup runs, it would be faster as we already have 300GB in GSAN.

Scenario 2:
The same above example can be an issue as well. Let's say we were unable to fix the VM backup issue in time, and concurrently we have to backup other VMs as well, then, we will run into a low space issue and we will have a need to get rid of this partial backup.

By default, the retention period of partial backup is 7 days. So if you can wait for 7 days to get rid this partial backup, then great, else this has to be removed manually. 

**Please note, the below details are performed on a test environment and it is highly recommended to involve EMC / VMware support for deleting any partial backups. Do not perform this on a production environment without VMware support. This is only for purely informational purpose**

The avmgr command is used to query the GSAN and pull the state of the backups from here.
The command avmgr getl --path=/ displays the following output:

root@vdp:~/#: avmgr getl --path=/
1  Request succeeded
1  AVI_BACKUPS  location: f074dd00a908ac6a609867b20b43e971c80649b8      pswd: 15b426de530782c5f693e374f5b2cafdc0ae150c
1  AVI_CLIENT_PACKAGES  location: 7dfd0087401cd49bf7f4186ed6143b8e6146261b      pswd: 0d4e85cd36cace14792900c971ccadfa848d6c9e
2  clients      location: b1cd4249e941ad657d6c59f76a8838da24ff8154      pswd: dda76d1efdcfcb6d328dc60f81e5c8cd1afe7028
1  EM_BACKUPS   location: 8d039aa5deabdc99b6351bf4fc1ea05017cf7e59      pswd: 04d5e386b31920e91939c9f59fb4b7a96edb8821
1  MC_BACKUPS   location: acf63aa36b24fcde6a2961c91f26d85add76799d      pswd: 8dc36744d05b4a99ec6340029a1bc2613c4def78
2  MC_DELETED   location: 1cdf93fb978e5163793f8d83ba530ebf432abfdd      pswd: 8489d1cfcc0731d639ebfdb27d594d84e32b68d7
2  MC_RETIRED   location: b53fde901de31b65186913b426e9464d67fe8c1a      pswd: 2ce5a94bbde633d7a7901892d2f88d2a6c30d242
2  MC_SYSTEM    location: 84996375bc7ec879dc80d12cf1b7d64315743251      pswd: ab47e5e43ef522c0c7c7551ad76a1753c2a28e9a
1  NETWORKER    location: 020cfc3e9794d089b724a19e57a404b7b680a43b      pswd: a118f64304f1b1bb8028395f052b0f499175403f
2  vcenter-dr.happycow.local    location: 4af3879e89ac5e9ea6a8110c23bcb6f46d21c7bd      pswd: effe5792c4781d770d5b640cf611ac837b16ea75

Here we will be interested in the vcenter-dr.happycow.local, so the command would now be
avmgr getl --path=/vcenter-dr.happycow.local  and this will give the following output:

root@vdp:~/#: avmgr getl --path=/vcenter-dr.happycow.local
1  Request succeeded
2  ContainerClients     location: 6f971b1c69953e3195fa6222caec1d7915b67705      pswd: 861934ff6a31c904e5a3bbecb785417e5b1856a8
1  vcenter-dr.happycow.local    location: 9c8a3f188118e45d945cd933731caffdd9f46899      pswd: 2d5e97fac3834756f4a9665d786178a9f16fe69c
2  VirtualMachines      location: 6992aaa96f0d58b8551f66fd3e2c4bcdf0106618      pswd: 830ba2c858464e58030d23c3f898d9bd544c9a47

All the VMs in vCenter will be under the VirtualMachines domain, so to view this, the final complete command would be  
avmgr getl --path=/vcenter-dr.happycow.local/VirtualMachines  and the output would be similar to:

root@vdp:~/#: avmgr getl --path=/vcenter-dr.happycow.local/VirtualMachines
1  Request succeeded
1  Replication-DR_UDIAGRFNOfX78JZlfofi6Q        location: b2bbe58c661383f1a958c5ac66b582e5eb42204a      pswd: 7f2c82c450a94ba49ef30ef9fa5a24dbb2b974b5
1  Test_UDLiusDGKgqWLzJxSiw2uw  location: a08fd778bebdebfb0a35c355de66ffa7e0cea153      pswd: e8d4b29382181a5167257c340094aaee4a074145

There are two clients available here, one is Replication-DR and the other one is Test
Now to view the backups for the VM, the command would be 
avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Test_UDLiusDGKgqWLzJxSiw2uw --format=xml

The output would be similar to:

root@vdp:~/#: avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Test_UDLiusDGKgqWLzJxSiw2uw --format=xml
1  Request succeeded
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backuplist version="3.0">
  <backuplistrec flags="32768001" labelnum="1" label="Test-1483131013980" created="1483131261" roothash="6f085059351b2ee305690f7253ed54b1c85c21b9" totalbytes="42949689344.00" ispresentbytes="0.00" pidnum="3016" percentnew="0" expires="1488315013" created_prectime="0x1d262dee4f31200" partial="0" retentiontype="daily,weekly,monthly,yearly" backuptype="Full" ddrindex="0" locked="1" direct_restore="1"/>
</backuplist>

Here if you notice, the Partial parameter is 0, which means this is a complete backup. 
Labelnum indicates which backup it is. 1 stands for first, 2 for second and so on. 

Now, I have manually cancelled a backup for the Replication-DR virtual machine, so now, if I run the same command that we ran above for the Test VM, we will see the below output:

root@vdp:~/#: avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Replication-DR_UDIAGRFNOfX78JZlfofi6Q --format=xml
1  Request succeeded
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backuplist version="3.0"/>

This is basically telling that there is no backup. So to view if this is a partial backup, we will have to include the --incpartials switch. The command and output now will be:
avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Replication-DR_UDIAGRFNOfX78JZlfofi6Q --incpartials --format=xml

root@vdp:~/#: avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Replication-DR_UDIAGRFNOfX78JZlfofi6Q --incpartials --format=xml
1  Request succeeded
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backuplist version="3.0">
  <backuplistrec flags="32505873" labelnum="1" label="Job-B-1483131429441" created="1483131592" roothash="caf75c1299a0336d9787ff20b523e1336e83c6fa" totalbytes="13436470272.00" ispresentbytes="0.00" pidnum="1016" percentnew="10" expires="1483736392" created_prectime="0x1d262dfaac24956" partial="1" retentiontype="daily,weekly,monthly,yearly" backuptype="Full" ddrindex="0" locked="0" direct_restore="1"/>
</backuplist>

Here if you notice the Partial parameter is 1 which confirms this is a partial backup. Now, the created time is 1483131592 and if I convert this from EPOCH to readable:

root@vdp:~/#: t.pl "1483131592"
local: Sat Dec 31 02:29:52 2016         gmt:Fri Dec 30 20:59:52 2016

And the expires time is 1483736392 which converts to:

root@vdp:~/#: t.pl "1483736392"
local: Sat Jan  7 02:29:52 2017         gmt:Fri Jan  6 20:59:52 2017

So the difference here between created and expired is 7 days, which means this backup will be removed automatically in 7 days and the space will be reclaimed by Garbage Collection. 

To remove this manually, again please note, do not perform this in your production environment. We will be running the avmgr delb command. The delb command is a highly destructive command and should be run with extreme caution. The command would be:
avmgr delb --id=root --path=/vcenter-fqdn/VirtualMachines/Client --date="<created_prectime">

So for my output for the partial backup of the Replication-DR virtual machine:
avmgr delb --id=root --path=/vcenter-dr.happycow.local/VirtualMachines/Replication-DR_UDIAGRFNOfX78JZlfofi6Q --date="0x1d262dfaac24956"

The output if successful will be seen as:
1  Request succeeded

Now if you run the avmgr command again to view partial backups, you will not see any backup list:
root@vdp:~/#: avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Replication-DR_UDIAGRFNOfX78JZlfofi6Q --incpartials --format=xml
1  Request succeeded
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backuplist version="3.0"/>

Post this, let the appliance run through its next maintenance window so that the garbage collection can reclaim this space. Then when you check the GSAN space it would be reduced by a considerable amount depending on how large the partial backup was.

That's pretty much about it.