Wednesday, 5 October 2016

Understanding VDP Backups In A Data Domain Mtree

This is going to be a high level view of how to find out which backups on data domain relate to which clients on the VDP. Previously, we saw how to deploy and configure a data domain and connect it to a vSphere Data Protection 5.8 appliance. I am going to discuss what is an Mtree and certain information about it, as this is needed for the next article, which would be migration of VDP from 5.8/6.0 to 6.1 

In a data domain file system, a Mtree is created to store the files and checkpoint data of VDP and data domain snapshots under the avamar ID node of the respective appliance. 

Now, on the data domain appliance, the below command needs to be executed to display the mtree list. 
# mtree list
The output is seen as:

Name                                             Pre-Comp (GiB)   Status
----------------------------   ----------------------------------  ------
/data/col1/avamar-1475309625       32.0                      RW
/data/col1/backup                             0.0                        RW
----------------------------   ---------------------------------   ------

So the avamar node ID is 1475309625. To confirm this is the same mtree node created for your VDP appliance, run the following on the VDP appliance:
# avmaint config --ava | grep -i "system"
The output is:
  systemname="vdp58.vcloud.local"
  systemcreatetime="1475309625"
  systemcreateaddr="00:50:56:B9:54:56"

The system create time is nothing but the avamar ID. Using these two commands you can confirm which VDP appliance corresponds to which mtree on the data domain. 

Now, I mentioned earlier that mtree is where your backup data, VDP checkpoints and other related files reside. So, the next question is how to see what are the directories under this mtree? To check the directories under mtree, run the following command in the data domain appliance.
 # ddboost storage-unit show avamar-<ID>
The output is:

Name                Pre-Comp (GiB)   Status   User
-----------------   --------------   ------   ------------
avamar-1475309625             32.0   RW       ddboost-user
-----------------   --------------   ------   ------------
cur
ddrid
VALIDATED
GSAN
STAGING
  • The cur directory is also called as the "current" directory where all your backup data is stored. 
  • The validated folder is where the validated checkpoints reside
  • The GSAN folder contains the checkpoint that was copied over from the VDP appliance. Remember, in the previous article we checked the option for "Enable Checkpoint Copy". This is what copies over the daily generated checkpoints on VDP to the data domain. Why this is required? We will look into this in much detail during the migrate operation. 
  • The STAGING folder is where all your "in-progress" backup job is saved. Once the backup job completes successfully, they will be moved to the cur directory. If the backup job fails, it will remain in the STAGING folder and will be cleared out during the next GC on the data domain.
Now, as mentioned before, DDOS does not have complete commands that is available in Linux, which is why you will have to enter SE (System Engineering) mode and enable bash shell to obtain the superset of commands to browse and modify directories. 

Please note: This is meant to be handled by a EMC technician only. All the information I am displaying here is purely from my lab. Try this at your own risk. If you are uncomfortable, stop now, and involve EMC support. 

To enable the bash shell, we will have to first enter the SE mode. To do this, we will need the password for SE which would be your system serial number. This can be obtained from the below command:
# system show serialno
The output is similar to:
Serial number: XXXXXXXXXXX

Enable SE mode using the below command and enter the Serial number as the password when prompted for:
 # priv set se
Once the password is provided you can see the user sysadmin@data-domain has changed to SE@data-domain

Now, we need to enable the bash shell for your data domain. Run these commands in the same order:

1. Display the OS information using:
# uname
You will see:
Data Domain OS 5.5.0.4-430231

2. Enable the File system using:
# fi st
You will see:
The filesystem is enabled and running.

3. Run the below command to show the filesystem space:
 # filesys show space
You will see

Active Tier:
Resource           Size GiB   Used GiB   Avail GiB   Use%   Cleanable GiB*
----------------   --------   --------   ---------   ----   --------------
/data: pre-comp           -       32.0           -      -                -
/data: post-comp      404.8        2.0       402.8     0%              0.0
/ddvar                 49.2        2.3        44.4     5%                -
----------------   --------   --------   ---------   ----   --------------
 * Estimated based on last cleaning of 2016/10/04 06:00:58.

4. Press "Ctrl+C" three times and then type shell-escape
This enters you to the bash shell and you will see the following screen.

*************************************************************************
****                            WARNING                              ****
*************************************************************************
****   Unlocking 'shell-escape' may compromise your data integrity   ****
****                and void your support contract.                  ****
*************************************************************************
!!!! datadomain YOUR DATA IS IN DANGER !!!! #

Again, proceed at your own risk and 100^10 percent, involve EMC when you do this. 

You saw the mtree was located at the path /data/col1/avamar-ID. The data partition is not mounted by default and needs to be mounted and unmounted manually. 

To mount the data partition run the below command:
# mount localhost:/data /data
This will return to the next line and will not show any output. Once the partition has been mounted successfully, you can then use your regular Linux commands to browse the mtree. 

So, a cd to the /data/col1/avamar-ID will show the following:

drwxrwxrwx  3 ddboost-user users 167 Oct  1 01:46 GSAN
drwxrwxrwx  3 ddboost-user users 190 Oct  2 02:40 STAGING
drwxrwxrwx  9 ddboost-user users 563 Oct  3 20:33 VALIDATED
drwxrwxrwx  4 ddboost-user users 279 Oct  2 02:40 cur
-rw-rw-rw-  1 ddboost-user users  40 Oct  1 01:43 ddrid

As mentioned, before the "cur" directory has all your successfully backed up data. If you change your directory to cur and do a "ls" you will find the following:

drwxrwxrwx  4 ddboost-user users 229 Oct  2 07:30 5890c0677a03211b49a9cf08bf1dcebd2d7cd77d

Now, this is the Client ID of the client (VM) that was successfully backed up by VDP.
To find which client on VDP corresponds to which CID on the data domain, we have 2 simple commands. 

To understand this, I presume you have a fair idea of what MCS and GSAN is on vSphere Data Protection. Your GSAN node is responsible for storing all the actual backup data if you have a local vmdk storage. If your VDP is connected to the data domain, then GSAN only holds the meta data of the backup and not the actual backup data (As this will be on the data domain) 
The MCS in brief is what waits for the work-order and calls in the avagent and avtar to perform the backup. The MCS if it understands there is a data domain connected to it, then, using the DD public-private key combination (Also called SSH keys) will talk to DD to perform the regular maintenance tasks. 

So, first, we will run the avmgr command (avmgr command is only for GSAN and will not work if GSAN is not running), to display the client ID on the GSAN node. The command would be:
# avmgr getl --path=/VC-IP/VirtualMachines
The output is:

1  Request succeeded
1  RHEL_UDlVr74uB7JdXN8jgjRLlQ  location: 5890c0677a03211b49a9cf08bf1dcebd2d7cd77d      pswd: 0d0d7c6b09f2a2234c108e4f0647c277e8bf2562

The one highlighted in red is nothing but the Client ID on the GSAN for the client RHEL (a virtual machine)

Then, we will run the mccli command (mccli command is only for MCS and needs MCS to be up and running) to display the client ID on the MCS server. The command would be:
# mccli client show --domain=/VC-IP/VirtualMachines --name="Client_name"
For example,
# mccli client show --domain=/192.168.1.1/VirtualMachines -name="RHEL_UDlVr74uB7JdXN8jgjRLlQ"
The output is a pretty detailed one, what we are interested is in this particular line:
CID                      5890c0677a03211b49a9cf08bf1dcebd2d7cd77d

So, we see the client ID on data domain = client ID on the GSAN = client ID on the MCS

Here, if your client ID on GSAN does not match the client ID on MCS, then your full VM restore and File Level Restores will not work. We will have this CID to be corrected in case of a mismatch to get the restores working. 

Now, back to the data domain end, we were under the cur directory, right? Next, I will change directory to the CID

# cd 5890c0677a03211b49a9cf08bf1dcebd2d7cd77d

I will then do another "ls" to list the sub directories under it, and you may or may not notice the following:

drwxrwxrwx  2 ddboost-user users 1.2K Oct  2 02:55 1D21C9327C2E4C6
drwxrwxrwx  2 ddboost-user users 1.4K Oct  2 07:30 1D21CB99431214C

If you have one folder which a sub client ID, then it means there has been only one backup executed and completed successfully for the virtual machine. If you see multiple folder, then it means there has been multiple backups completed for this VM. 

To find out which backup was done first and which were the subsequent backups, we will have to query the GSAN, as you know, the GSAN holds the meta-data of the backups. 

Hence, on the VDP appliance, run the below command:
# avmgr getb --path=/VC-IP/VirtualMachines/Client-Name --format=xml
For example:
# avmgr getb --path=/192.168.1.1/VirtualMachines/RHEL_UDlVr74uB7JdXN8jgjRLlQ --format=xml
The output will be:

<backuplist version="3.0">

  <backuplistrec flags="32768001" labelnum="2" label="RHEL-DD-Job-RHEL-DD-Job-1475418600010" created="1475418652" roothash="505f1aba07f19d64df74670afa59ed39a3ece85d" totalbytes="17180938240.00" ispresentbytes="0.00" pidnum="1016" percentnew="0" expires="1476282600" created_prectime="0x1d21cb99431214c" partial="0" retentiontype="daily,weekly,monthly,yearly" backuptype="Full" ddrindex="1" locked="1"/>
  
  <backuplistrec flags="16777217" labelnum="1" label="RHEL-DD-Job-1475401181065" created="1475402150" roothash="22dc0dddea797d909a2587291e0e33916c35d7a2" totalbytes="17180938240.00" ispresentbytes="0.00" pidnum="1016" percentnew="0" expires="1476265181" created_prectime="0x1d21c9327c2e4c6" partial="0" retentiontype="none" backuptype="Full" ddrindex="1" locked="0"/>
</backuplist>

Looks confusing? Maybe, let's look at specific fields:

labelnum field shows the order of the backups. 
labelnum=1 means first backup, 2 means second and so on.

roothash is the hash value of the backup job. Next time you run incremental backup, it will check for the existing hashes, and ddboost will only backup the new hashes. The atomic hashes are then combined to form one unique root hash. So, root hash for each backup is unique. 

created_prectime is the main thing what we need. This is what we called as the sub client ID. 
For labelnum=1, we see the sub CID is 0x1d21c9327c2e4c6
For labelnum=2, we see the sub CID is 0x1d21cb99431214c

Now, let's go further into the CID. For example if I cd into the 0x1d21c9327c2e4c6 and perform a "ls" I will see the following:

-rw-rw-rw-  1 ddboost-user users  485 Oct  2 02:40 1188BE924964359A5C8F5EAEF552E523FBA83566
-rw-rw-rw-  1 ddboost-user users 1.1K Oct  2 02:40 140A189746A6EC3C49D24EA43A7811205345F1F4
-rw-rw-rw-  1 ddboost-user users 3.8K Oct  2 02:40 2CE724F2760C46CB67F679B76657C23606C06869
-rw-rw-rw-  1 ddboost-user users 2.5K Oct  2 02:40 400206DF07A942C066971D84F0CF063D2DE50F08
-rw-rw-rw-  1 ddboost-user users 1.0M Oct  2 02:55 4F50E1E506477801D0A566DEE50E5364B0F04BF0
-rw-rw-rw-  1 ddboost-user users  451 Oct  2 02:55 79DDA236EEEF192EED66CF605CD710B720A41E1F
-rw-rw-rw-  1 ddboost-user users 1.1K Oct  2 02:55 AFB6C8621EB6FA86DD8590841F80C7C78AC7BEEC
-rw-rw-rw-  1 ddboost-user users 1.9K Oct  2 02:40 B17DD9B7E8B2B6EE68294248D8FA42A955539C4C
-rw-rw-rw-  1 ddboost-user users  16G Oct  2 02:55 B212DB46684FFD5AFA41B87FD71A44469B04A38C
-rw-rw-rw-  1 ddboost-user users   15 Oct  2 02:40 D2CFFD87930DAEABB63EAEAA3C8C2AA9554286B5
-rw-rw-rw-  1 ddboost-user users 9.4K Oct  2 02:40 E2FF0829A0F02C1C6FA4A38324A5D9C23B07719B
-rw-rw-rw-  1 ddboost-user users 3.6K Oct  2 02:55 ddr_files.xml

Now there is a main file (record file) called ddr_files.xml. This file will have all the information regarding what the other files are for in this directory.

So if I take the first Hex number and grep for it in the ddr_files.xml I see the following;
# grep -i 1188BE924964359A5C8F5EAEF552E523FBA83566 ddr_files.xml
The interested output is:
clientfile="virtdisk-descriptor.vmdk"

So this a vmdk file that was backed up.

Similarly,
# grep -i 400206DF07A942C066971D84F0CF063D2DE50F08 ddr_files.xml
The interested output is:
clientfile="vm.nvram"

And one more example:
# grep -i 4F50E1E506477801D0A566DEE50E5364B0F04BF0 ddr_files.xml
The interested output is:
clientfile="virtdisk-flat.vmdk"

So if your VM file IDs are not populated correctly in the ddr_files.xml, then again your restores will not work. Engage EMC to get this corrected, because I am stressing again, do not fiddle with this in your production environment.

That's pretty much it for this. If you have questions feel free to comment or in-mail. The next article is going to be about Migrating VDP 5.8/6.0 to 6.1 with a data domain.