Monday, 16 January 2017

Slow GUI Response On VDP 6.1.3

I recently ran into an issue, where the Backup Job tab was extremely slow in loading the jobs and when I say extremely slow it was taking forever to load the jobs. This was the same with the other tabs as well in the Web Client VDP GUI.

In the axis2.log under /usr/local/avamar/var/mc/server_log the following was logged:

2017-01-16 12:57:35,105 [1690894503@qtp-1786872722-26] ERROR org.apache.axis2.transport.http.AxisServlet  - Java heap space
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Unknown Source)
        at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
        at java.lang.AbstractStringBuilder.append(Unknown Source)
        at java.lang.StringBuffer.append(Unknown Source)
        at java.io.StringWriter.write(Unknown Source)
        at com.ctc.wstx.sw.BufferingXmlWriter.flushBuffer(BufferingXmlWriter.java:1103)
        at com.ctc.wstx.sw.BufferingXmlWriter.fastWriteRaw(BufferingXmlWriter.java:1114)
        at com.ctc.wstx.sw.BufferingXmlWriter.writeStartTagEnd(BufferingXmlWriter.java:743)
        at com.ctc.wstx.sw.BaseNsStreamWriter.closeStartElement(BaseNsStreamWriter.java:388)
        at com.ctc.wstx.sw.BaseStreamWriter.writeCharacters(BaseStreamWriter.java:446)
        at org.apache.axiom.util.stax.wrapper.XMLStreamWriterWrapper.writeCharacters(XMLStreamWriterWrapper.java:100)
        at org.apache.axiom.om.impl.MTOMXMLStreamWriter.writeCharacters(MTOMXMLStreamWriter.java:289)
        at org.apache.axis2.databinding.utils.writer.MTOMAwareXMLSerializer.writeCharacters(MTOMAwareXMLSerializer.java:139)
        at com.avamar.mc.sdk10.EventMoref.serialize(EventMoref.java:155)
        at com.avamar.mc.sdk10.EventMoref.serialize(EventMoref.java:78)
        at com.avamar.mc.sdk10.ActivityInfo.serialize(ActivityInfo.java:889)
        at com.avamar.mc.sdk10.ActivityInfo.serialize(ActivityInfo.java:198)
        at com.avamar.mc.sdk10.ArrayOfActivityInfo.serialize(ArrayOfActivityInfo.java:216)
        at com.avamar.mc.sdk10.TaskInfo.serialize(TaskInfo.java:630)
        at com.avamar.mc.sdk10.DynamicValue.serialize(DynamicValue.java:244)
        at com.avamar.mc.sdk10.DynamicValue.serialize(DynamicValue.java:152)
        at com.avamar.mc.sdk10.ArrayOfDynamicValues.serialize(ArrayOfDynamicValues.java:216)
        at com.avamar.mc.sdk10.ArrayOfDynamicValues.serialize(ArrayOfDynamicValues.java:160)
        at com.avamar.mc.sdk10.GetDynamicPropertyResponse.serialize(GetDynamicPropertyResponse.java:165)
        at com.avamar.mc.sdk10.GetDynamicPropertyResponse.serialize(GetDynamicPropertyResponse.java:109)
        at com.avamar.mc.sdk10.GetDynamicPropertyResponse$1.serialize(GetDynamicPropertyResponse.java:97)
        at org.apache.axis2.databinding.ADBDataSource.serialize(ADBDataSource.java:93)
        at org.apache.axiom.om.impl.llom.OMSourcedElementImpl.internalSerialize(OMSourcedElementImpl.java:692)
        at org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:556)
        at org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:874)
        at org.apache.axiom.soap.impl.llom.SOAPEnvelopeImpl.internalSerialize(SOAPEnvelopeImpl.java:230)
2017-01-16 12:58:20,038 [1690894503@qtp-1786872722-26] ERROR org.apache.axis2.transport.http.AxisServlet  - Java heap space
java.lang.OutOfMemoryError: Java heap space
2017-01-16 12:59:07,070 [486089829@qtp-1786872722-25] ERROR org.apache.axis2.transport.http.AxisServlet  - Java heap space
java.lang.OutOfMemoryError: Java heap space

Because of this the backup jobs never used to run and the mccli commands took forever to report the outputs.
The solution was to obviously increase the Java heap memory. The steps would be to:
(Backup the files before editing them)

1. Browse to the following location:
# cd /usr/local/avamar/var/mc/server_data/prefs
2. Make a backup of the mcserver.xml file before editing it. Open the mcserver.xml file using a vi editor
# vi mcserver.xml
3. Locate the following line. <entry key="maxJavaHeap" value="-Xmx1G" /> and change the value from 1G to 2G

Before Edit:
<entry key="maxJavaHeap" value="-Xmx1G" />
After Edit:
<entry key="maxJavaHeap" value="-Xmx2G" />

4. Save this file

5. Go to the following location
# cd /usr/local/avamar/lib/
6. Make a copy of mcserver.xml file in this location and open it in vi editor and edit the same parameter in this file too:

Before Edit:
<entry key="maxJavaHeap" value="-Xmx1G" merge="newvalue" />
After Edit:
<entry key="maxJavaHeap" value="-Xmx2G" merge="newvalue" />

7. Save the file

8. Switch to admin mode of VDP sudo su - admin and restart MCS using:
# mcserver.sh --restart

Post this, the GUI should show some relief in terms of loading and you should no longer see the java heap error in axis2.log

Hope this helps.

Tuesday, 10 January 2017

Avamar Virtual Edition 7.1: Failed To Communicate To vCenter During Client Registration

Once you setup your Avamar Server, you will proceed to add the VMware vCenter as a client to this. You will provide the vCenter IP/FQDN along with the administrator user credentials and the default https port 443. However, it errors out stating:

Failed to communicate to vCenter. Unable to find valid certification path to the vCenter. 


The error is due to Avamar not being able to acknowledge the VMware vCenter certificate warning. To do this, we will have to force the avamar to accept the vCenter certificate. 

Login to SSH of the Avamar Server and edit the mcserver.xml file:
# vi /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

Locate the vcenter certificate ignore parameter by searching for /cert in the vi editor. You will notice the below line.


Change this value from false to true and save the file.

Change the access to admin mode using sudo su - admin and run the below script to restart MCS.
# mcserver.sh --restart

Post this, you should be able to add this vCenter as a client to avamar successfully


That should be it.

Sunday, 8 January 2017

Part 5: Creating Recovery Plans In SRM 6.1

Part 4: Creating protection groups for virtual machines in SRM 6.1

Once you create a protection group, it's time to create a recovery plan. When you want to perform a DR test or a test recovery, it is the recovery plan that you will execute. A recovery plan is tasked to run a set of steps in a particular order to fail over the VMs or test the failover to the recovery site. You cannot change the workflow of the recovery plan, however you can customize by adding your required checks and tasks in between.

Select the production site in SRM inventory and under Summary Tab select Create a recovery plan.


Provide a name for the recovery plan and an optional description and click Next.


Select the recovery site where you want the VMs to failover to and click Next.


The Group type will be VM protection groups and then select the required protection groups to be added to this recovery plan. Only the VMs in the protection group added to the recovery plan will be failed over in an event of disaster. Click Next.


We have something called as Test Recovery. Test recovery does a test failover of the protected VMs to the recovery site without impacting the production VMs working or network identity. A test network or a bubble network (A network with no uplinks) will be created on the recovery site and these VMs will be placed there and bough up to verify if the recovery plan is working good. Keep the default auto create settings and click Next.


Review your recovery plan settings and click Finish to complete the create recovery plan wizard.


If you select the protected site, Related Objects and Recovery plans you can see this recovery plan being listed.


If you select the Recovery Plans in the Site Recovery Inventory, you will see the status of the plan and their related details.


Before you test your recovery, you will have to configure this recovery plan. Browse to, Recovery Plans, Related Objects, Virtual Machines. The VMs available under this recovery plan will be listed. Right click the virtual machine and select Configure Recovery


There are two options here, Recovery properties and IP customization.

The recovery properties discusses the order of VM startup, VM dependencies and additional steps that has to be carried out during and after Power On.

Since I just have one virtual machine in this recovery plan, the priority and the dependencies does not really matter. Set these options as to your requirement.


In the IP Customization option, you will provide the network details for the virtual machine in the Primary and the Recovery Site.


Select Configure Protection and you will be asked to configure IP settings of the VM in protected site. If you have VM tools running on this machine (Recommended), then click Retrieve and it will auto populate the IP settings. Click DNS option and enter the DNS IP and the domain name manually. Click OK to complete. The same steps has to be performed in the Recovery Site too under Configure Recovery, however, all the IP details has to be entered manually (If DHCP is not used) since there are no VM tools or powered On VM on the recovery site.


Once both are complete, you should see the below information in the IP Customization section. Click OK to finish configuring VM recovery.


Once this is performed for all the virtual machines in the recovery plan, the plan customization is complete and ready to be tested. You can also use the DR IP Customization tool to configure VM recovery settings.

In the next article, we will have a look at testing a recovery plan.

Part 4: Creating Virtual Machine Protection Groups In SRM 6.1

Part 3: Configuring Inventory Mappings in SRM 6.1

Once you configure inventory mappings, you will then have to configure protection groups. In protection group you will add the required virtual machines to be failed over by SRM in case of a disaster event.

Select the protected site and in the Summary tab under Guide to configuring SRM click Create a protection group.


Specify a name for this protection group that you will be creating and a description (not mandatory)


You will get to choose the direction for the protection group and the protection group type. In this, vcenter-prod is my production site and vcenter-dr is my recovery site.

I am using a vSphere Replication appliance, host based replication appliance, hence the replication group type would be vSphere Replication so that we can choose the VMs being replicated by this.
You cannot have vR and array-based replication VMs in the same protection group.


I have one virtual machine being replicate by vR. Check the required virtual machine and click Next.


Review the settings and click Finish to complete creating the protection group.


Under the production site, Related Objects, Protection Groups, you will be able to see the protection group listed.


Under the Protection Group option in the SRM inventory, you will see the same protection group listed.


And now in the recovery site, under the Recovered resource pool (selected during Inventory Mapping) you will be able to see the placeholder virtual machine.


With this, we have successfully created a protection group for virtual machines.

Part 5: Creating Recovery Plan in SRM 6.1

Thursday, 5 January 2017

Part 3: Configuring Inventory Mappings In SRM 6.1


Once the SRM sites are paired, you will then have to configure Inventory Mappings on the production site. Inventory mappings basically tell, these are the resources that will be available for the virtual machines in the protected site and these are another set of resources for the same VMs in the recovery site when failed over. Only when inventory mappings are established successfully you will be able to create placeholder virtual machines. If inventory mappings are not performed, then each VM has to be configured individually for resource availability after a fail-over.

In the below screenshot you can see that inventory mappings can be created for resource pools, folders, networks and datastores. There is only a Green check for Site Pairing as this was done in the previous article.


First we will be configuring resource mappings. Click the Create Resource Mappings. In the Production vCenter section, select the Production resource Pool. The virtual machine Router which we replicated earlier resides in this resource pool. Select the appropriate resource pool on the recovery site as well. Click Add Mappings and you should now see the direction of mapping in the bottom section. Click Next.


If you would like to establish re-protection, that is to fail back from the recovery site to the Protected Site, then check the option in Prepare reverse mappings. This is optional and right now, I do not want reverse mapping and I will leave it unchecked. Click Finish.


Now back in the summary tab,the resource mapping would be green checked and next we will be configuring Folder Mappings. In the Summary tab section, click Create Folder Mappings. I will manually configure this, so select Prepare mappings manually. Click Next.


Similarly like resource mapping, select the Folder that is required on the protected site where the protected virtual machines are and select the appropriate folder in the recovery site and click Add Mappings. Once the direction is displayed, click Next.


I will not be configuring any reverse mappings, so will leave this option unchecked and click Finish.


Once the Folder mapping is green checked, click Create Network Mapping and again this will be a manual configuration. Click Next.


In the same way, select the network on the Protected site where the protected VMs will be and an appropriate network on the recovery site and click Add Mappings. Then click Next.


Test networks would be created by default if you would like to test your recovery plans. The production VMs would be running as they are and a failed over instance of these VMs will be bought up on an isolated test network on the recovery site. Click Next.


Again, no reverse mappings here. I will simply click Finish.


With this the inventory mapping for resource, network and folder should be completed.

Next we will be configuring the Placeholder Datastore.

For every virtual machine in protected site, SRM will create a Placeholder VM on the recovery site. This placeholder VM will reside on a datastore and that datastore will be called as placeholder datastore. Once you specify this placeholder datastore, SRM will create VM files on that datastore in recovery site and uses them to register placeholder VMs on the recovery site inventory.

Two key requirements for placeholder datastore:
1. If there is a cluster in recovery site, the placeholder datastore must be visible to all the hosts in that cluster
2. You cannot use a replicated datastore as a placeholder datastore.

Select the Recovery Site in the SRM inventory and click Configure Placeholder Datastore. From the list below, select the required datastore to be used for placeholder and click OK.


If you would like to establish re-protection and fail-back, then you will have to select the placeholder datastore on the production site as well. You will have to click the protected site in the SRM inventory and configure placeholder datastore again.

With this, the inventory mapping will be completed.

Part 4: Creating virtual machine protection groups in SRM 6.1

Wednesday, 4 January 2017

VDP 6.1.3 - ESXi 5.1 Compatibility Issues

The VMware interoperability matrix says VDP 6.1.3 is compatible with ESXi 5.1. However, if you backup a VM when VDP is running on a 5.1 ESXi it will fail. I tried this with the following setup.

I deployed a 6.1.3 VDP on an ESXi 5.1 Build 1065491 and created a virtual machine on this same host. Then, I ran an on demand backup for this VM and it failed immediately. The backup job log had the following entry:

2017-01-04T21:13:36.723-05:-30 avvcbimage Warning <16041>: VDDK:SSL: Unknown SSL Error
2017-01-04T21:13:36.723-05:-30 avvcbimage Info <16041>: VDDK:SSL Error: error:14077102:SSL routines:SSL23_GET_SERVER_HELLO:unsupported protocol
2017-01-04T21:13:36.723-05:-30 avvcbimage Warning <16041>: VDDK:SSL: connect failed (1)
2017-01-04T21:13:36.723-05:-30 avvcbimage Info <16041>: VDDK:CnxAuthdConnect: Returning false because SSL_ConnectAndVerify failed
2017-01-04T21:13:36.724-05:-30 avvcbimage Info <16041>: VDDK:CnxConnectAuthd: Returning false because CnxAuthdConnect failed
2017-01-04T21:13:36.724-05:-30 avvcbimage Info <16041>: VDDK:Cnx_Connect: Returning false because CnxConnectAuthd failed
2017-01-04T21:13:36.724-05:-30 avvcbimage Info <16041>: VDDK:Cnx_Connect: Error message:
2017-01-04T21:13:36.724-05:-30 avvcbimage Warning <16041>: VDDK:[NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error:
2017-01-04T21:13:36.742-05:-30 avvcbimage Warning <16041>: VDDK:SSL: Unknown SSL Error
2017-01-04T21:13:36.705-05:-30 avvcbimage Info <16041>: VDDK:NBD_ClientOpen: attempting to create connection to vpxa-nfcssl://[datastore1 (2)] Thick/Thick.vmdk@10.109.10.171:902
2017-01-04T21:13:36.723-05:-30 avvcbimage Warning <16041>: VDDK:SSL: Unknown SSL Error
2017-01-04T21:13:36.723-05:-30 avvcbimage Info <16041>: VDDK:SSL Error: error:14077102:SSL routines:SSL23_GET_SERVER_HELLO:unsupported protocol
2017-01-04T21:13:36.723-05:-30 avvcbimage Warning <16041>: VDDK:SSL: connect failed (1)
2017-01-04T21:13:36.743-05:-30 avvcbimage Info <16041>: VDDK:DISKLIB-DSCPTR: : "vpxa-nfcssl://[datastore1 (2)] Thick/Thick.vmdk@10.109.10.171:902" : Failed to open NBD extent.
2017-01-04T21:13:36.743-05:-30 avvcbimage Info <16041>: VDDK:DISKLIB-LINK  : "vpxa-nfcssl://[datastore1 (2)] Thick/Thick.vmdk@10.109.10.171:902" : failed to open (NBD_ERR_NETWORK_CONNECT).
2017-01-04T21:13:36.743-05:-30 avvcbimage Info <16041>: VDDK:DISKLIB-CHAIN : "vpxa-nfcssl://[datastore1 (2)] Thick/Thick.vmdk@10.109.10.171:902" : failed to open (NBD_ERR_NETWORK_CONNECT).
2017-01-04T21:13:36.743-05:-30 avvcbimage Info <16041>: VDDK:DISKLIB-LIB   : Failed to open 'vpxa-nfcssl://[datastore1 (2)] Thick/Thick.vmdk@10.109.10.171:902' with flags 0x1e NBD_ERR_NETWORK_CONNECT (2338).
2017-01-04T21:13:36.780-05:-30 avvcbimage Info <16041>: VDDK:NBD_ClientOpen: attempting to create connection to vpxa-nfcssl://[datastore1 (2)] Thick/Thick.vmdk@10.109.10.171:902

The VDDK disk release on 6.1.3 VDP is:
2017-01-04T21:13:27.244-05:-30 avvcbimage Info <16041>: VDDK:VMware VixDiskLib (6.5) Release build-4241604

Cause:
Refer to Backward compatibility of TLS in the VDDK release article here.

Recommended Fix:
Upgrade to 5.5 U3e or later.

Workaround:
On the VDP appliance, edit the below file:
# vi /etc/vmware/config
Add the below line:
tls.protocols=tls1.0,tls1.1,tls1.2

Save the file. There is no need to restart any services. Re-run the backup job and the backups should now complete successfully. 

Part 2: Pairing Sites in Site Recovery Manager 6.1

Part 1: Installing Site Recovery Manager

In this article we will see how to pair the two SRM sites we installed previously. So if you see here, I have two SRM sites, Production and a DR site. Soon after a fresh install these sites will not be paired and you will see the message "Site is not paired" for both the Production and Recovery Site. In order for failover to take place the Site needs to be paired. Click the option Pair Site in the center screen.


Enter the Platform Services Controller of the DR site and click Next.


The SRM plugin extension would be com.vmware.vcDr unless you chose to create a custom Plugin ID during install. Once the PSC detail is given in previous step, the vCenter corresponding to it will be populated automatically. Select the vCenter and provide the SSO user credentials for authentication and click Finish.


If you are presented with a certificate warning, click Yes to proceed. The pairing should now be completed.


Now, you can see the Sites have been paired from the Summary tab. And also the paired site details will be populated in the "Paired Site" section.


That's it.

Part 3: Configuring Inventory Mappings in SRM 6.1

Tuesday, 3 January 2017

Part 3: Recover A VM Using vSphere Replication

Part 2: Pairing vR Sites and configuring replication for a virtual machine

In this article, we will be performing a recovery of replicated virtual machine using vSphere replication. To perform a recovery, you will have to select the target vCenter (vCenter-DR in my case), select Monitor and Incoming replication.


You will see the below screen at this point and you will notice a big red button with a play symbol. This would be the recovery option. Select this icon.


You will then be presented with the type of recovery you would like to do


Recover with recent changes: This first option will need to have the source VM powered down. Before initiating the recovery process it will sync the recent changes with the source VM, so the recovered VM will be up-to-date.

Use latest available data: If you would not like to power down the source or if the source is unavailable or corrupted, you will choose this option. Here, it will make use of the recent replicated data to recover the virtual machine.

We will be using the second option to recover the virtual machine. In this wizard you will have to choose a destination folder to restore this virtual machine to.


You will then have to select the ESXi host and (if available) a resource pool to recover this virtual machine to.


You will have an option to keep the recovered VM Powered on or off. Depending on your requirement you can select this, and click Finish to begin the recovery process.


Once the recovery is complete, the virtual machine will be now available in the target site, and all the VM files that were named as hbr.UUID.vmdk (VM files that were replicated) will be renamed to the actual virtual machine files)

The status of the replication will now switch to Recovered and there will be no more active replication for this virtual machine.



Resuming Replication After Replication: Reprotect and Failback.

In most scenarios, once a VM is recovered you would like to re-establish the replication the other way to ensure there is a new replicated instance in case if this recovered virtual machine fails at some point. This is called as reverse replication or reprotection.

Initially, the replication was from vcenter-prod to vcenter-dr with the virtual machine residing on the vcenter-prod. Post a recovery, the virtual machine is now running on vcenter-dr. So, now the replication direction changes from vcenter-dr to vcenter-prod.

You will have to first stop the current configured replication for the virtual machine. On the target site, under incoming replication (Above screenshot), right click the VM with status as recovered and select Stop. Then, the virtual machine on the source has to be unregistered (Remove from inventory) on the source side. Once the replication is stopped and the source (old) virtual machine is unregistered, you will then have to reconfigure the replication. The process is same as discussed in Part 2 of this article.

The only difference is, when you select a destination datastore for the replication data to reside you will receive the following message. Select Use Existing. With this option, it will inform you that there are already a set of drives available on the target site and they will be the replication seeds. A initial Full Sync will still occur, but it will not be a copy of data, it will be just a check of the hash to ensure the validity. Once this is done, the new data will then be replicated first, and then replicated according to your set RPO.


Once the replication status goes to OK, you will have a valid replicated instance of the virtual machine at the new target site ready to be recovered.

Performing A Manual Recovery.

Until now, you saw vSphere Replication taking care of all recovery operation. But for some reason, the vCenter is down and you would like to recover a critical virtual machine. If vCenter is down, you cannot manage your vSphere Replication. Then in this case, we will be performing a manual recovery.

From the SSH of the ESXi host, you can see the VM files that are replicated:
# cd /vmfs/volumes/54ed030d-cd8f4a16-9fef-ac162d7a2fa0/Router

-rw-------    1 root     root        8.5K Jan  3 08:24 hbrcfg.GID-c3732b6f-de63-4c55-a830-a4437d91a143.4.nvram.8
-rw-------    1 root     root        3.1K Jan  3 08:24 hbrcfg.GID-c3732b6f-de63-4c55-a830-a4437d91a143.4.vmx.7
-rw-------    1 root     root       84.0K Jan  3 08:24 hbrdisk.RDID-297047a6-c7d0-4322-b290-bb610582daf1.5.59562057314158-delta.vmdk
-rw-------    1 root     root         368 Jan  3 08:24 hbrdisk.RDID-297047a6-c7d0-4322-b290-bb610582daf1.5.59562057314158.vmdk
You will have to rename these VM files to vmdk, flat.vmdk, vmx, nvram extensions. So, create a new folder under the datastore directory.
# cd /vmfs/volumes/54ed030d-cd8f4a16-9fef-ac162d7a2fa0/
# mkdir Rec
Pause the replication and copy / clone the vmdk to the new location using vmkfstools -i
# cd /vmfs/volumes/54ed030d-cd8f4a16-9fef-ac162d7a2fa0/Router
# vmkfstools -i hbrdisk.RDID-297047a6-c7d0-4322-b290-bb610582daf1.5.59562057314158.vmdk -d thin /vmfs/volumes/54ed030d-cd8f4a16-9fef-ac162d7a2fa0/Rec/Rec.vmdk
You will see the following output:
Destination disk format: VMFS thin-provisioned
Cloning disk 'hbrdisk.RDID-297047a6-c7d0-4322-b290-bb610582daf1.5.59562057314158.vmdk'...
Clone: 100% done.

Copy / Rename the vmx and nvram files using the below command:
# cp -a hbrcfg.GID-c3732b6f-de63-4c55-a830-a4437d91a143.4.vmx.7 /vmfs/volumes/54ed030d-cd8f4a16-
9fef-ac162d7a2fa0/Rec/Rec.vmx
# cp -a hbrcfg.GID-c3732b6f-de63-4c55-a830-a4437d91a143.4.nvram.8 /vmfs/volumes/54ed030d-cd8f4a1
6-9fef-ac162d7a2fa0/Rec/Rec.nvram
Finally, register the VM from the command line using:
# vim-cmd solo/registervm /vmfs/volumes/54ed030d-cd8f4a16-9fef-ac162d7a2fa0/Rec/Rec.vmx
If the registration was successful there will be a VM ID allocated as the output and you can verify the same in the vSphere client.

That's pretty much it.

Monday, 2 January 2017

When Nothing Is Left, Avtar Restore To The Rescue

There are multiple ways to restore a virtual machine in vSphere Data Protection.


When all of these fail, there is another option to restore a virtual machine. I am not sure about what it is called as, I refer to it as command line restore of a virtual machine using avtar.

**Before proceeding, please do not perform this in your production environment as the process is pretty tricky and can cause data loss if not done right. This is last of a last resort that we need to stick to. If restores are failing, the first step would be to fix it. Involve a VMware resource to perform this. That's as much as I can say. Post that, it's your call and risk**

The steps are pretty simple, you just need to be sure and careful on what is being selected. I ran into this issue while working on one of the cases logged with us. I cannot use the output from the session, so I had to reproduce this in my lab.

So, having said that. Let's have a look at the setup. I have a virtual machine on one of my ESXi host, and the name of the VM is Jump. It is a Windows box, with one virtual hard drive of 40 GB. The SCSI controller used here is 0:0. Then, I have a 512 GB of VDP deployed which has 4 drives. The SCSI controllers by default are, 0:0, 1:0, 2:0. 3:0.

With this, let's have a look at the steps:

1. It is always good to restore this disk to a new VM rather than to an existing VM because it reduces to complexity and risks by a large factor. This is because, let's say your VM has 8 drives and drive 6 and 8 has gone corrupt and there is no other means of restore available now. If you perform the avtar level restore, it is quite confusing on what disk has to be chosen and you might end up re-writing a different VMDK.

So to be safe create a new VM with a new hard disk with the same type of provisioning as the old one. Though it is not a hard requirement for the new drive to have similar provisioning, it would reduce the post restore process by a great deal. Like you would not have to SVmotion the drives to change the provisioning.

Now, when you create this new VMDK, please use a unique SCSI controller. Also, the drive created should be at least 1 GB more than the source disk. If my Source disk was 40 GB, I will create this new VMDK as 41 GB. The SCSI controller used here should be the same as any of the existing drives on the original VM or the VDP VM. Once the disk is created, keep the VM powered off, and add the same disk to the VDP appliance as well.
Basically, you will Edit Settings on the VDP appliance > Add > Hard Disk > Use existing hard disk and browse the datastore where this VM resides and add the hard drive. While adding the drive, use the same SCSI controller that was used on the newly created VM.

This would finish the step 1. Now switch to the command line of the VDP appliance for further process.

2. We will have to obtain the LabelNum of the backup existing for this VM, so that you can restore the contents from. To do this, first you will have to verify if the client is available in the GSAN. To do this, run the below command:
# avmgr getl --path=/vcenter-prod.happycow.local/VirtualMachines

The output will be similar to:
1  Request succeeded
1  Jump_UqrwzzeV6zMpRI8yfCBqgQ  location: c3d109f23e18075b48f680f0821730b417260427      pswd: e06a865b7bf4d0aadf90be28de519b8c0681354e

I just have one virtual machine in this VDP, hence one output in the GSAN. Now to get the labelNum of backups for this Jump VM, run the below command:
# avmgr getb --path=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ --format=xml

The output will be similar to:
1  Request succeeded
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backuplist version="3.0">
  <backuplistrec flags="32768001" labelnum="1" label="Jump-1483351122072" created="1483351973" roothash="a6711baf9a0db97be019109cb7ea177ec7a8035e" totalbytes="42949988352.00" ispresentbytes="0.00" pidnum="3016" percentnew="17" expires="1488535122" created_prectime="0x1d264e0c7ce7616" partial="0" retentiontype="daily,weekly,monthly,yearly" backuptype="Full" ddrindex="0" locked="1" direct_restore="1"/>
</backuplist>

LabelNum=1 specifies this is the first backup of the virtual machine. If I back this VM one more time and run the same command we will have two <backuplist> available and the labelNum counter would be incremented to 2 and so on.

3. We will have to list out the files available for this VM. It should list out the vmx, vmdk, flat.vmdk and the nvram files for this VM backup. The command would be:
# avtar --list --labelnum=1 --path=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ

The output will be similar to:
avtar Info <5551>: Command Line: /usr/local/avamar/bin/avtar.bin --flagfile=/usr/local/avamar/etc/usersettings.cfg --password=**************** --vardir=/usr/local/avamar/var --server=vdp-dest --id=root --bindir=/usr/local/avamar/bin --vardir=/usr/local/avamar/var --bindir=/usr/local/avamar/bin --sysdir=/usr/local/avamar/etc --list --sequencenumber=1 --account=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ
avtar Info <7977>: Starting at 2017-01-03 00:25:06 IST [avtar Oct 14 2016 05:53:11 7.2.180-118 Linux-x86_64]
avtar Info <6555>: Initializing connection
avtar Info <5552>: Connecting to Avamar Server (vdp-dest)
avtar Info <5554>: Connecting to one node in each datacenter
avtar Info <5583>: Login User: "root", Domain: "default", Account: "/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ"
avtar Info <5580>: Logging in on connection 0 (server 0)
avtar Info <5582>: Avamar Server login successful
avtar Info <10632>: Using Client-ID='c3d109f23e18075b48f680f0821730b417260427'
avtar Info <5550>: Successfully logged into Avamar Server [7.2.80-118]
avtar Info <8745>: Backup from Linux host "/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ" (vdp-dest.happycow.local) with plugin 3016 - Windows VMWare Image
avtar Info <5538>: Backup #1 label "Jump-1483351122072" timestamp 2017-01-02 15:42:53 IST, 9 files, 40.00 GB
avtar Info <40113>: Backup #1 created by avtar version 7.2.180-118
VMConfiguration/
VMConfiguration/avamar vm configuration.xml
VMConfiguration/snapshot description.xml
VMConfiguration/vm.nvram
VMConfiguration/vm.ovf
VMConfiguration/vm.vmx
VMConfiguration/vss-manifest.zip
VMFiles/
VMFiles/1/
VMFiles/1/attributes.xml
VMFiles/1/virtdisk-descriptor.vmdk
VMFiles/1/virtdisk-flat.vmdk
avtar Info <5314>: Command completed (exit code 0: success)

4. The VMDK file obtained from the above avtar command should be accessible. To verify this, run the below command:
# avtar -x --path=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ --labelnum=1 -O VMFiles/1/virtdisk-descriptor.vmdk

The output would be similar to:
avtar Info <5551>: Command Line: /usr/local/avamar/bin/avtar.bin --flagfile=/usr/local/avamar/etc/usersettings.cfg --password=**************** --vardir=/usr/local/avamar/var --server=vdp-dest --id=root --bindir=/usr/local/avamar/bin --vardir=/usr/local/avamar/var --bindir=/usr/local/avamar/bin --sysdir=/usr/local/avamar/etc -x --account=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ --sequencenumber=1 -O VMFiles/1/virtdisk-descriptor.vmdk
avtar Info <7977>: Starting at 2017-01-03 00:28:29 IST [avtar Oct 14 2016 05:53:11 7.2.180-118 Linux-x86_64]
avtar Info <6555>: Initializing connection
avtar Info <5552>: Connecting to Avamar Server (vdp-dest)
avtar Info <5554>: Connecting to one node in each datacenter
avtar Info <5583>: Login User: "root", Domain: "default", Account: "/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ"
avtar Info <5580>: Logging in on connection 0 (server 0)
avtar Info <5582>: Avamar Server login successful
avtar Info <10632>: Using Client-ID='c3d109f23e18075b48f680f0821730b417260427'
avtar Info <5550>: Successfully logged into Avamar Server [7.2.80-118]
avtar Info <5295>: Starting restore at 2017-01-03 00:28:29 IST as "root" on "vdp-dest.happycow.local" (4 CPUs) [7.2.180-118]
avtar Info <40113>: Backup #1 created by avtar version 7.2.180-118
avtar Info <5949>: Backup file system character encoding is UTF-8.
avtar Info <8745>: Backup from Linux host "/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ" (vdp-dest.happycow.local) with plugin 3016 - Windows VMWare Image
avtar Info <5538>: Backup #1 label "Jump-1483351122072" timestamp 2017-01-02 15:42:53 IST, 9 files, 40.00 GB
avtar Info <5291>: Estimated size for "VMFiles/1/virtdisk-descriptor.vmdk" is 463 bytes
# comment this is an avamar backup
version=1
createType="vmfs"

# Extent description
RW 83886080 VMFS "virtdisk-flat.vmdk"

# The Disk Data Base
#DDB
dbb.adapterType = "lsilogic"
dbb.geometry.cylinders = "5221"
dbb.geometry.heads = "255"
dbb.geometry.sectors = "63"
dbb.longContentID = "9c70cdf008a0d44ace4aa9d83340427c"
dbb.thinProvisioned = "1"
dbb.toolsVersion = "10246"
dbb.uuid = "60 00 C2 98 b6 dc 27 49-cf 44 c6 73 c4 42 e2 84"
dbb.virtualHWVersion = "11"
avtar Info <5267>: Restore of "VMFiles/1/virtdisk-descriptor.vmdk" completed
avtar Info <7925>: Restored 463 bytes from selection(s) with 463 bytes in 1 files
avtar Info <6090>: Restored 463 bytes in 0.01 minutes: 4.271 MB/hour (9,673 files/hour)
avtar Info <7883>: Finished at 2017-01-03 00:28:30 IST, Elapsed time: 0000h:00m:00s
avtar Info <6645>: Not sending wrapup anywhere.
avtar Info <5314>: Command completed (exit code 0: success)

The RW section would describe the size of the VMDK. 83886080 x 512 = 42949672960 bytes, corresponds to 41943040 KB, which is 40960 MB which translates to 40 GB.

5. So now on the new VM I have a 41 GB drive with SCSI 1:1 created and this is attached to the VDP appliance with the same 1:1 controller. Rescan for storage using the below command:
# echo "- - -" > /sys/class/scsi_host/host1/scan

Here "- - -" defines the three values stored inside host*/scan i.e. channel number, SCSI target ID, and LUN values. We are simply replacing the values with wild cards so that it can detect new changes attached to the Linux box. This procedure will add LUNs, but not remove them.

6. Run fdisk -l and the new device should not be detected as a formatted partition. You should be seeing the below output:

Disk /dev/sde doesn't contain a valid partition table

7. Now the next command is restore. This command will start as soon as you hit enter. It will not give you any option to proceed with a yes or no prompt. So be careful on what is entered here before you proceed.

The command would be:
avtar -x --nostdout --account=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ --labelnum=1 -O VMFiles/1/virtdisk-flat.vmdk > /dev/sde

The labelnum can differ depending on your requirement. 
This will not show any output or any progress of the restore. If the VMDK created was a thin provisioned drive, then you can login to ESXi and run the below command:
# watch -n1 stat vm-name-flat.vmdk

This will refresh the output of the VMDK every 1 second. 

The output should be similar to:
Every 1s: stat Jump1-flat.vmdk                                                                                                                                                                                           2017-01-02 20:41:32
File: Jump1-flat.vmdk
Size: 44023414784     Blocks: 305152     IO Block: 131072 regular file
Device: 61f328bc5c1ebe26h/7058029830484180518d  Inode: 226524548   Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-01-02 20:39:58.000000000
Modify: 2017-01-02 20:39:58.000000000
Change: 2017-01-02 20:41:30.000000000

And this should be refreshing until the block size correlates to 40 GB. To calculate this, you need to Blocks x 512 = Size in Bytes.

8. Now, detach this drive from the VDP appliance. So in the end, you should have your new VM powered off and this drive attached to it. Power On the VM and you should be able to see the data.

Restore Virtual Machine From Command Line Using mccli

If you have worked on vSphere Data Protection, you will know that you can perform a Restore of a virtual machine from the VDP GUI in the web client. If the VDP web client GUI is unavailable and when the vCenter is down, we utilize the Direct-Host (Emergency Restore) option. However, with emergency restore you get to restore the VMs only on the host where the VDP is residing by disassociating the VDP's ESXi host from vCenter.

Another, less known option is to restore virtual machines from the command line of the VDP appliance. I had to spend quite a while to get the right switches and verify with a couple of sources before I got this restore done successfully.

1. You will have to check if the client is registered to the VDP and get the domain of the client if it is registered. Both of these can be obtained from the below command:
# mccli client show --recursive=true

The command outputs:

0,23000,CLI command completed successfully.
Client                    Domain                                     Client Type
------------------------- ------------------------------------------ ------------------------------------
vdp.happycow.local        /clients                                   VMware Image Proxy with Guest Backup
Replication-DR            /vcenter-dr.happycow.local/VirtualMachines Virtual Machine
Test                      /vcenter-dr.happycow.local/VirtualMachines Virtual Machine
vcenter-dr.happycow.local /vcenter-dr.happycow.local                 vCenter

Here, I will be restoring the Client called Test and the Domain for this VM is vcenter-dr.happycow.local/VirtualMachines

2. The restored virtual machine will be residing on a datastore. Run the below command to see if the datastore you would like to restore this VM to is seen by the VDP appliance. 
# mccli vcenter show --name=/vcenter-fqdn --recursive --type=datastore

The sample command and output will be similar to:
root@vdp:~/#: mccli vcenter browse --name=/vcenter-dr.happycow.local --recursive --type=datastore

0,23000,CLI command completed successfully.
Name             Type      Accessible    Hosts                  Datacenter
------------------ -------- -------------- -------------------   ------------------
is-tse-d128-1  VMFS    Yes             10.109.10.128    /Datacenter-DR
exit15_ISOs    NFS       Yes            10.109.10.128    /Datacenter-DR

3. Verify if the Folder you would like to restore this VM to on the vCenter is visible to the VDP appliance. 
# mccli vcenter show --name=/vcenter-fqdn --recursive --type=container

The sample command and the output will be:
root@vdp:~/#: mccli vcenter browse --name=/vcenter-dr.happycow.local --recursive --type=container

0,23000,CLI command completed successfully.
Name    Location                   Protected Type
------- -------------------------- --------- ------
Restore /Datacenter-DR/vm/Restore/ No        Folder
FL      /Datacenter-DR/vm/FL/      No        Folder

4. List all the available backups for the client that you would like to restore:
# mccli backup show --name=/vcenter-fqdn/VirtualMachines/<client-name> --recursive=true

The sample command and output will be:
root@vdp:~/#: mccli backup show --name=/vcenter-dr.happycow.local/VirtualMachines/Test --recursive=true

0,23000,CLI command completed successfully.
Created                          LabelNum Size    Retention Hostname           Location
----------------------- -------- ------- --------- ------------------ --------
2017-01-01 20:08:01 IST   3        40.0 GB DWMY      vdp.happycow.local Local
2017-01-01 20:04:02 IST   2        40.0 GB DWMY      vdp.happycow.local Local
2016-12-31 02:24:21 IST   1        40.0 GB DWMY      vdp.happycow.local Local

Here, the LabelNum column tells the order of the backup. 1 means the first, 2 is for second and so on. LabelNum=3 is the latest backup for this client in my example. 
You will have to note down which labelNum you would like to restore your VM from. I will be choosing LabelNum=1

5. Identify Plugin to be used during the restore. The plugin IDs are contained in the below file:
# less /usr/local/avamar/lib/plugin_catalog.xml
And the plugin ID for Windows VM is 3016 and Linux VM is 1016

The below command will output this for you:
# grep -i 'plugin-entry pid-number="1016"\|plugin-entry pid-number="3016"' /usr/local/avamar/lib/plugin_catalog.xml
The output:
    <plugin-entry pid-number="1016" pid="vmimage" description="Linux VMware Image">
    <plugin-entry pid-number="3016" pid="vmimage" description="Windows VMware Image">

6. Restore the VM using the below command:
# mccli backup restore --name=/vcenter-fqdn/VirtualMachines/<client-name>  --labelnum=<which backup to be restored> --restore-vm-to=new --virtual-center-name=<your-vcenter-fqdn> --datacenter=<your-datacenter-name> --folder=<the folder to restore the vm> --dest-client-name=<name for restored VM> --esx-host-name=<name of esxi host where restored VM should reside> --datastore-name=<where VM file should reside> --plugin=<plugin number> 
The sample command and the output will be:

root@vdp:~/#: mccli backup restore --name=/vcenter-dr.happycow.local/VirtualMachines/Test --labelNum=1 --restore-vm-to=new --virtual-center-name=vcenter-dr.happycow.local --datacenter=Datacenter-DR --folder=Restore --dest-client-name=Restored --esx-host-name=10.109.10.128 --datastore-name=is-tse-d128-1 --plugin=3016

0,22312,client restore scheduled.
Attribute   Value
----------- ----------------------------------------------------------------------
client      /vcenter-dr.happycow.local/VirtualMachines/Test_UDLiusDGKgqWLzJxSiw2uw
activity-id 9148334304676709

7. Monitor the restore status from the GUI or the command line using:
# mccli activity show --active
The output:

0,23000,CLI command completed successfully.
ID               Status  Error Code Start Time           Elapsed     End Time             Type    Progress Bytes New Bytes Client   Domain
---------------- ------- ---------- -------------------- ----------- -------------------- ------- -------------- --------- -------- ------
9148334304676709 Running 0          2017-01-02 13:14 IST 00h:00m:18s 2017-01-03 13:14 IST Restore 0 bytes        0%        Restored //N/A

Once the restore is completed, verify if the VM is available in the right location. The restored VM will be powered off by default.