Thursday, 23 February 2017

VDP FLR Does Not Expand Mount Points

File Level Recovery client or FLR is used when you want to restore certain files rather than an entire VM. To understand completely on login mode of FLR and how to perform a FLR you can refer this link here

Now, you might come across an issue where you are able to mount the required backup. However, you are unable to expand that mount point and view the drives and the folders. All you will see is the name of the VM image and an arrow that does not expand.

 Occasionally, you might see an error stating:

"Failed to get disks: Unable to browse as proxies are unavailable"

And you might also have persistent or intermittent login issues to FLR client (either basic login or advanced) with the below message.

There are few explanations for this, but the base cause is unresponsive proxy.

Case 1: When internal proxy is being used

By default VDP will be having an internal proxy which caters to the backup and restore tasks. If the internal proxy is down, then FLR authentication will fail persistently. You will not even have an opportunity to mount the backup.
Login in to the https://vdp-ip:8543/vdp-configure page and verify if the internal proxy is in a healthy state.

Restart the proxy service using the below command:
# service avagent-vmware restart

If this does not provide relief, disable and re-enable the internal proxy from the vdp-configure page.

Case 2: When multiple external proxies are used. 

There might also be a case where multiple external proxies (Maximum of 8) are being used. Out of these few might be responsive and the rest are unresponsive. In this case, the issue is intermittent. Let's say, you have 4 proxies, and 2 are responding and the rest 2 are not. When a login in request or a backup/restore comes in, it will utilize any one of these proxies. If the request goes to a proxy that is up and running, then your FLR login and expand restore point will work. If the request is handed to that unresponsive proxy, then the login / expand mount point fails.

To restart external proxy, login to the vdp-configure page, select the unresponsive proxy, then click the Gear Icon and click "Restart Proxy". Once all the proxies are confirmed to be ticked green, attempt the FLR again.

Hope this helps.

Wednesday, 8 February 2017

VDP Backup Fails With "Failed To Remove Snapshot"

There might be scenarios where you execute a backup for a virtual machine. It starts successfully, takes a snapshot successfully and completes the backup process, however at the very end, it fails to remove the snapshot for the VM. This would be seen persistently for one or more virtual machine.

At the very end of the backup job log you would see something like:

2017-02-06T11:53:39.552+04:00 avvcbimage Warning <16004>: Soap fault detected, Query problem, Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"Connection timed out"
Detail: connect failed in tcp_connect()"

2017-02-06T11:53:39.552+04:00 avvcbimage Error <17773>: Snapshot (snapshot-5656) removal for VM '[Suhas-Store-2] VM01/VM01.vmx' task failed to start

2017-02-06T11:53:39.552+04:00 avvcbimage Info <18649>: Removal of snapshot 'VDP-1486397576f70379edb62fb81285abbf68dfadc0bd0758ba83' is not complete, moref 'snapshot-5656'.

2017-02-06T11:53:39.552+04:00 avvcbimage Info <9772>: Starting graceful (staged) termination, Problem with the snapshot removal. (wrap-up stage)

If you see there is a Connection time out message once the snapshot remove call is handed down to the virtual machine. For this VDP-ID if you look into the vmware.log, you will notice the following:

2017-02-06T16:13:02.636Z| vmx| I125: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'VDP-1486397576f70379edb62fb81285abbf68dfadc0bd0758ba83': 55

2017-02-06T16:58:30.826Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2017-02-06T16:59:11.235Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2017-02-06T16:59:13.117Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.

We see there is a lot of timeout occurring from the VMtools. And at the same time if you notice the datastore where this VM resides, you will see that it is on a NFS storage:

2017-02-06T11:13:40.355+04:00 avvcbimage Info <0000>: checking datastore type for special processing, checking for type VVOL, actual type = NFS

And if you see the mode of backup you see it is a hot-add mode of backup:

2017-02-06T11:13:40.337+04:00 avvcbimage Info <9675>: Connected with hotadd transport to virtual disk [Suhas-Store-2] VM01/VM01.vmdk

Now, when the VM is residing on NFSv3 there are issues with timeout due to NFS lock during snapshot consolidation. This KB explains the cause of it. The workaround here is to disable hot-add mode of backup and switch to NBD or NBDSSL.

1. SSH into the VDP appliance and browse to the below directory:
# cd /usr/local/avamarclient/var
2. Edit the avvcbimageAll.cmd using a vi editor and enter the below line:

3. Save the file and restart avagent using:
# service avagent-vmware restart
Post this the backup should complete successfully. Hope this helps.

Tuesday, 31 January 2017

Unable To Send Test Email In VDP 6.x

In VDP, you can configure SMTP so that the appliance can forward the appliance status, backup job status periodically to the email configured. Once you configure all the settings in Configuration > Email, you will use the Send test email option to check if this configuration is working good. However, once you select this, you might receive the following message:

The troubleshooting to this would be to verify if the SSO, vCenter and VDP time are all synchronized. More about this is discussed in the article here.

Now there are two ways of looking at this issue:

1. If you had the NTP issue mentioned above and you corrected it. But still you are running into the same message over and over again though the time is synced. (This is the issue I worked on)

2. Perhaps you are receiving something else, a different error. But the vdr-server.log stack trace matches below:

2017-01-31 13:07:22,607 ERROR [http-nio-8543-exec-3]-email.EmailBase: On try 1to send the email, recieved a connection error from the external email server:null
2017-01-31 13:07:22,607 ERROR [http-nio-8543-exec-3]-services.EmailConfigServiceWS: Unable to trigger test email
        at javax.mail.Service.connect(
        at javax.mail.Service.connect(
        at javax.mail.Service.connect(
        at javax.mail.Transport.send0(
        at javax.mail.Transport.send(
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(
        at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(
        at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(
        at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(
        at com.sun.jersey.spi.container.servlet.WebComponent.service(
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(
        at javax.servlet.http.HttpServlet.service(
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(
        at org.apache.catalina.core.StandardWrapperValve.invoke(
        at org.apache.catalina.core.StandardContextValve.invoke(
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(
        at org.apache.catalina.core.StandardHostValve.invoke(
        at org.apache.catalina.valves.ErrorReportValve.invoke(
        at org.apache.catalina.valves.AccessLogValve.invoke(
        at org.apache.catalina.core.StandardEngineValve.invoke(
        at org.apache.catalina.connector.CoyoteAdapter.service(
        at org.apache.coyote.http11.AbstractHttp11Processor.process(
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$ Source)
        at org.apache.tomcat.util.threads.TaskThread$
        at Source)

This is only specific to the error javax.mail.AuthenticationFailedException which was observed in the above log. If you do not see this, then this will probably not help you. 

Again two causes for this:

1. If you ran into the same NTP issue, then the credentials are not updated or synced prior and post to the time change
2. You are indeed entering a wrong password.

If the issue is (1), then Edit the SMTP settings and re-enter the password again post which the Test email should work. If the issue is (2) then you will have to get the right password. 

Hope this helps. 

Friday, 27 January 2017

VDP Restore Fails With "Operation Not Permitted"

Today, a restore of a virtual machine was failing. Infact, restore of all virtual machines were failing. The data was sent to a data domain, but regardless of what the backend storage is, this issue would still occur.

In the restore logs under /usr/local/avamarclient/var with the keyword MOD-EPOCH-vmimage1.log, the following was seen:

2017-01-26T13:11:53.027+04:00 avvcbimage Info <6688>: Process 13007 (/usr/local/avamarclient/bin/avtar) finished (code 1: Operation not permitted)
2017-01-26T13:11:53.027+04:00 avvcbimage Info <7238>: Stream complete: 0 bytes read from stream
2017-01-26T13:11:53.027+04:00 avvcbimage Warning <6690>: CTL workorder "MOD-1485454308259" non-zero exit status 'code 1: Operation not permitted'
2017-01-26T13:11:53.028+04:00 avvcbimage Error <0000>: [IMG0011] Avtar exited with 'code 1: Operation not permitted'
2017-01-26T13:11:53.028+04:00 avvcbimage Info <19647>: Restore Job Wrap-Up Stats:

The avtar log for the same had this:

2017-01-26T13:11:53.025+04:00 avtar Warning <10786>: Cannot use --verbose with --stream=&1, remove the verbose option flag to stop this error

The restore progress used to fail right at 0 percent. 

As the error message stated, we had some additional unwanted entry in the avtar.cmd file. To resolve this:

1. Go to /usr/local/avamarclient/var
2. Open the avtar.cmd file in a vi editor
3. Remove the " --verbose " flag from this and save the file
4. Restart the avagent using:
# service avagent-vmware restart

Post this, the restore should work without issues.

Hope this helps.

Wednesday, 25 January 2017

VDP Verification Job Leaves Undeleted Folders

There was an interesting case opened quite a while back which took almost few months to figure out the solution. The issue was, a backup verification job was performed for a set of VMs. The verification job used to complete successfully, however, it used to leave a folder behind with the name of the VM's verification job ID and in that folder was one file which was the vmxf file of that virtual machine.

If you took a look at the verification job logs on the VDP server, you will not see any errors or warnings. The VM will be test restored, powered on, tested for VM tools heartbeat and then powered off and deleted.

So, on the ESXi datastore, the folder left behind was something similar to this name, VDP_VERIFICATION_Windows_clone_2_1485298729541. And in this directory the file left behind was VDP_VERIFICATION_Windows_clone_2.vmxf

If you look at the hostd.log during this process, you will notice the following:

2017-01-24T23:44:04.127Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteFile dele
ted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/VDP_VERIFICATION_Windows_clone_2_1485298729541.nvram'.

2017-01-24T23:44:04.128Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteFile dele
ted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/vmware.log'.

2017-01-24T23:44:04.128Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteFile dele
ted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/564d7f1f-1606-518d-6b4e-d8689735d481.vmem.WRITELOCK'.

2017-01-24T23:44:04.128Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteFile dele
ted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/VDP_VERIFICATION_Windows_clone_2_1485298729541.vmsd~'.

2017-01-24T23:44:04.128Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteFile dele
ted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/VDP_VERIFICATION_Windows_clone_2_1485298729541.vmsd.lck'.

2017-01-24T23:44:04.129Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteFile dele
ted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/VDP_VERIFICATION_Windows_clone_2_1485298729541.vmsd'.

2017-01-24T23:44:04.129Z info hostd[234C2B70] [Originator@6876 sub=vm:SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFI
CATION_Windows_clone_2_1485298729541/VDP_VERIFICATION_Windows_clone_2_1485298729541.vmx opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] lck'.

2017-01-24T23:44:04.130Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteVMInt: Couldn't remove directory '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541'.

2017-01-24T23:44:04.130Z verbose hostd[234C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/VDP_VERIFICATION_Windows_clone_2_1485298729541.vmx opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] Virtual machine delete completed.

If you notice here, the verification job is removing all the test restored files. However, it says it is unable to remove the VDP Verification directory. At this point, if you run a list command in the datastore of ESXi, you will notice there is a vmxf file present. And because of this VDP is unable to remove the verification folder.
During backup, VDP will only backup vmx, nvram, vmdk and flat files. It will not backup the vmxf file. So if it does not backup the vmxf file, it will neither restore the vmxf file. So the question is, what is this vmxf file? Vmxf file was initially created to contain the virtual machine meta data. Then later on, the tools information was added to this file. So what this means is, if you have a virtual machine with VM tools running, you should see a vmxf file associated with that VM. If there is no tools installed on a VM, then there will be no vmxf file.

So, if you perform a ABV of a VM without any tools, then you will not have this vmxf file left behind. So during the ABV, the VDP will test for VM tools heartbeat. So when the VM is powered On, the vmxf file gets created under the ABV VM folder. But since VDP did not create this file, it will neither delete this file. It will delete all the files that was restored by it, leaving this vmxf file under the verification folder. And the clean up in this case has to be done manually.

There is no much use of vmxf file in your production environment. If you delete this file manually and take a new backup of the VM and verify the restore point, the file is still coming back. This means that something in the "vmx" file is causing the "vmxf" file to appear.

The resolution:

1. Power off the virtual machine
2. If you vi the virtual machine's vmx file, you will see the following entry:
extendedConfigFile = "Windows_clone_2.vmxf"

3. Use the vi editor to hash this entry out.

Before edit:
extendedConfigFile = "Windows_clone_2.vmxf"
After edit:
 #extendedConfigFile = "Windows_clone_2.vmxf"

4. Save the file and reload the virtual machine's vmx. To reload, you will have to find the VM ID. To do this, run the below command:
# vim-cmd vmsvc/getallvms
5. Once you get the ID, reload it using:
# vim-cmd vmsvc/reload <vm-id>
6. Remove the vmxf file manually from the virtual machine directory

7. Power On the VM. Wait for the VM to be completely up and the tools status to be Running (Current) Now, if you browse the VM directory, the vmxf is no longer created.

8. Perform another backup for this VM and then perform the verification and this time there will be no stale files left behind.

If you make any changes to the VM tools, such as reinstall or upgrade, then another entry for vmxf will be added in the virtual machine's vmx file and this entire process has to be re-done.

Hope this helps!

Saturday, 21 January 2017

VDP 6.1.3 Backups Hang At 92 Percent When Backing Up VMs On vSAN

Earlier, we had seen an issue where the VDP appliance crashes when we backup a virtual machine residing on a vSAN datastore. This was documented in this VMware KB article here.

On VDP 6.1.3 we use VDDK version 6.5, VixDiskLib (6.5) Release build-4241604. The earlier issue in VDDK 6.0 as mentioned in the above KB article was unable to mount a vSAN sparse delta disk on a proxy server or the VDP server (when using internal proxy) when it was residing on a non vSAN datastore.

In VDDK 6.5, the behavior was changed. The issue still exists but rather than crashing the VDP appliance, the VDDK library crashes causing the backups to fail. The workaround as per the same article was to:

1. If internal proxy was being used, then deploy an external proxy and place it on the vSAN datastore.
2. If space usage is not an issue and internal proxy is being used, move the entire VDP appliance to the vSAN datastore.

However, in some cases this did not work and the backups still used to fail, however the task and events in vSphere would show the progress at 92 percent and mccli activity show --active would not show any progress bytes.

I do not have a vSAN storage in my setup, the back-trace I am using is from couple of cases that I worked on and a community discussion thread which can be found here.

If you view the backup failure vimagew.log under /usr/local/avamarclient/var, you will notice the following back-trace:

2017-01-17T04:02:16.101+04:00 avvcbimage FATAL <5889>: Fatal signal 11 in pid 15065
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  FATAL ERROR: <0001> uapp::handlefatal: Fatal signal 11
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000006b93b1
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000006ba127
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000006ba44b
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000006ba53e
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 0000000000628810
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e56c90850
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e5050193f
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e5051c277
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e5051c75e
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e5052b142
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e5052b678
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e56ecab73
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e56ecb123
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000004284bc
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000004291f9
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00000000004993c4
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 000000000070ac4c
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e56c88806
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  | 00007f2e558cd9bd
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread]  ERROR: <0001> uapp::handlefatal: aborting program pid=15065, sig=11
2017-01-17T04:02:16.103+04:00 avvcbimage FATAL <5890>: handlefatal: Aborting program with code 176, pid=15065, sig=11
2017-01-17T04:02:16.103+04:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Exit: Unmatched Init calls so far: 1.

To fix this, change the protocol of backup to nbd. The steps would be:

1. SSH into VDP and browse to the below location:
# cd /usr/local/avamarclient/var
2. Edit the file called as avvcbimageAll.cmd using a vi editor

3. Add the below parameter into this file:

4. Save the file and restart the avagent service using:
 # service avagent-vmware restart

Now your backups should complete successfully. I will update this thread when a fix is released for this bug.
Hope this helps.

Monday, 16 January 2017

Slow GUI Response On VDP 6.1.3

I recently ran into an issue, where the Backup Job tab was extremely slow in loading the jobs and when I say extremely slow it was taking forever to load the jobs. This was the same with the other tabs as well in the Web Client VDP GUI.

In the axis2.log under /usr/local/avamar/var/mc/server_log the following was logged:

2017-01-16 12:57:35,105 [1690894503@qtp-1786872722-26] ERROR org.apache.axis2.transport.http.AxisServlet  - Java heap space
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Unknown Source)
        at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
        at java.lang.AbstractStringBuilder.append(Unknown Source)
        at java.lang.StringBuffer.append(Unknown Source)
        at Source)
        at com.ctc.wstx.sw.BufferingXmlWriter.flushBuffer(
        at com.ctc.wstx.sw.BufferingXmlWriter.fastWriteRaw(
        at com.ctc.wstx.sw.BufferingXmlWriter.writeStartTagEnd(
        at com.ctc.wstx.sw.BaseNsStreamWriter.closeStartElement(
        at com.ctc.wstx.sw.BaseStreamWriter.writeCharacters(
        at org.apache.axiom.util.stax.wrapper.XMLStreamWriterWrapper.writeCharacters(
        at org.apache.axis2.databinding.utils.writer.MTOMAwareXMLSerializer.writeCharacters(
        at org.apache.axis2.databinding.ADBDataSource.serialize(
        at org.apache.axiom.soap.impl.llom.SOAPEnvelopeImpl.internalSerialize(
2017-01-16 12:58:20,038 [1690894503@qtp-1786872722-26] ERROR org.apache.axis2.transport.http.AxisServlet  - Java heap space
java.lang.OutOfMemoryError: Java heap space
2017-01-16 12:59:07,070 [486089829@qtp-1786872722-25] ERROR org.apache.axis2.transport.http.AxisServlet  - Java heap space
java.lang.OutOfMemoryError: Java heap space

Because of this the backup jobs never used to run and the mccli commands took forever to report the outputs.
The solution was to obviously increase the Java heap memory. The steps would be to:
(Backup the files before editing them)

1. Browse to the following location:
# cd /usr/local/avamar/var/mc/server_data/prefs
2. Make a backup of the mcserver.xml file before editing it. Open the mcserver.xml file using a vi editor
# vi mcserver.xml
3. Locate the following line. <entry key="maxJavaHeap" value="-Xmx1G" /> and change the value from 1G to 2G

Before Edit:
<entry key="maxJavaHeap" value="-Xmx1G" />
After Edit:
<entry key="maxJavaHeap" value="-Xmx2G" />

4. Save this file

5. Go to the following location
# cd /usr/local/avamar/lib/
6. Make a copy of mcserver.xml file in this location and open it in vi editor and edit the same parameter in this file too:

Before Edit:
<entry key="maxJavaHeap" value="-Xmx1G" merge="newvalue" />
After Edit:
<entry key="maxJavaHeap" value="-Xmx2G" merge="newvalue" />

7. Save the file

8. Switch to admin mode of VDP sudo su - admin and restart MCS using:
# --restart

Post this, the GUI should show some relief in terms of loading and you should no longer see the java heap error in axis2.log

Hope this helps.

Tuesday, 10 January 2017

Avamar Virtual Edition 7.1: Failed To Communicate To vCenter During Client Registration

Once you setup your Avamar Server, you will proceed to add the VMware vCenter as a client to this. You will provide the vCenter IP/FQDN along with the administrator user credentials and the default https port 443. However, it errors out stating:

Failed to communicate to vCenter. Unable to find valid certification path to the vCenter. 

The error is due to Avamar not being able to acknowledge the VMware vCenter certificate warning. To do this, we will have to force the avamar to accept the vCenter certificate. 

Login to SSH of the Avamar Server and edit the mcserver.xml file:
# vi /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

Locate the vcenter certificate ignore parameter by searching for /cert in the vi editor. You will notice the below line.

Change this value from false to true and save the file.

Change the access to admin mode using sudo su - admin and run the below script to restart MCS.
# --restart

Post this, you should be able to add this vCenter as a client to avamar successfully

That should be it.

Sunday, 8 January 2017

Part 5: Creating Recovery Plans In SRM 6.1

Part 4: Creating protection groups for virtual machines in SRM 6.1

Once you create a protection group, it's time to create a recovery plan. When you want to perform a DR test or a test recovery, it is the recovery plan that you will execute. A recovery plan is tasked to run a set of steps in a particular order to fail over the VMs or test the failover to the recovery site. You cannot change the workflow of the recovery plan, however you can customize by adding your required checks and tasks in between.

Select the production site in SRM inventory and under Summary Tab select Create a recovery plan.

Provide a name for the recovery plan and an optional description and click Next.

Select the recovery site where you want the VMs to failover to and click Next.

The Group type will be VM protection groups and then select the required protection groups to be added to this recovery plan. Only the VMs in the protection group added to the recovery plan will be failed over in an event of disaster. Click Next.

We have something called as Test Recovery. Test recovery does a test failover of the protected VMs to the recovery site without impacting the production VMs working or network identity. A test network or a bubble network (A network with no uplinks) will be created on the recovery site and these VMs will be placed there and bough up to verify if the recovery plan is working good. Keep the default auto create settings and click Next.

Review your recovery plan settings and click Finish to complete the create recovery plan wizard.

If you select the protected site, Related Objects and Recovery plans you can see this recovery plan being listed.

If you select the Recovery Plans in the Site Recovery Inventory, you will see the status of the plan and their related details.

Before you test your recovery, you will have to configure this recovery plan. Browse to, Recovery Plans, Related Objects, Virtual Machines. The VMs available under this recovery plan will be listed. Right click the virtual machine and select Configure Recovery

There are two options here, Recovery properties and IP customization.

The recovery properties discusses the order of VM startup, VM dependencies and additional steps that has to be carried out during and after Power On.

Since I just have one virtual machine in this recovery plan, the priority and the dependencies does not really matter. Set these options as to your requirement.

In the IP Customization option, you will provide the network details for the virtual machine in the Primary and the Recovery Site.

Select Configure Protection and you will be asked to configure IP settings of the VM in protected site. If you have VM tools running on this machine (Recommended), then click Retrieve and it will auto populate the IP settings. Click DNS option and enter the DNS IP and the domain name manually. Click OK to complete. The same steps has to be performed in the Recovery Site too under Configure Recovery, however, all the IP details has to be entered manually (If DHCP is not used) since there are no VM tools or powered On VM on the recovery site.

Once both are complete, you should see the below information in the IP Customization section. Click OK to finish configuring VM recovery.

Once this is performed for all the virtual machines in the recovery plan, the plan customization is complete and ready to be tested. You can also use the DR IP Customization tool to configure VM recovery settings.

In the next article, we will have a look at testing a recovery plan.

Part 4: Creating Virtual Machine Protection Groups In SRM 6.1

Part 3: Configuring Inventory Mappings in SRM 6.1

Once you configure inventory mappings, you will then have to configure protection groups. In protection group you will add the required virtual machines to be failed over by SRM in case of a disaster event.

Select the protected site and in the Summary tab under Guide to configuring SRM click Create a protection group.

Specify a name for this protection group that you will be creating and a description (not mandatory)

You will get to choose the direction for the protection group and the protection group type. In this, vcenter-prod is my production site and vcenter-dr is my recovery site.

I am using a vSphere Replication appliance, host based replication appliance, hence the replication group type would be vSphere Replication so that we can choose the VMs being replicated by this.
You cannot have vR and array-based replication VMs in the same protection group.

I have one virtual machine being replicate by vR. Check the required virtual machine and click Next.

Review the settings and click Finish to complete creating the protection group.

Under the production site, Related Objects, Protection Groups, you will be able to see the protection group listed.

Under the Protection Group option in the SRM inventory, you will see the same protection group listed.

And now in the recovery site, under the Recovered resource pool (selected during Inventory Mapping) you will be able to see the placeholder virtual machine.

With this, we have successfully created a protection group for virtual machines.

Part 5: Creating Recovery Plan in SRM 6.1