Thursday, 26 April 2018

SRM Service Fails To Start: "Could not initialize Vdb connection Data source name not found and no default driver specified"

In few cases, you might come across a scenario where the Site Recovery Manager service does not start and in the Event Viewer you will notice the following back trace for the vmware-dr service.

VMware vCenter Site Recovery Manager application error.
class Vmacore::Exception "DBManager error: Could not initialize Vdb connection: ODBC error: (IM002) - [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified"
[backtrace begin] product: VMware vCenter Site Recovery Manager, version: 6.5.1, build: build-6014840, tag: vmware-dr, cpu: x86_64, os: windows, buildType: release
backtrace[00] vmacore.dll[0x001F29FA]
backtrace[01] vmacore.dll[0x00067EA0]
backtrace[02] vmacore.dll[0x0006A85E]
backtrace[03] vmacore.dll[0x00024064]
backtrace[04] vmware-dr.exe[0x00107621]
backtrace[05] MSVCR120.dll[0x00066920]
backtrace[06] MSVCR120.dll[0x0005E36D]
backtrace[07] ntdll.dll[0x00092A63]
backtrace[08] vmware-dr.exe[0x00014893]
backtrace[09] vmware-dr.exe[0x00015226]
backtrace[10] windowsService.dll[0x00002BF5]
backtrace[11] windowsService.dll[0x00001F24]
backtrace[12] sechost.dll[0x00005ADA]
backtrace[13] KERNEL32.DLL[0x000013D2]
backtrace[14] ntdll.dll[0x000154E4]
[backtrace end]  

There are no logs generated in vmware-dr.log and the ODBC connection test completes successfully too. 

However, when you go to vmware-dr.xml file located under C:\Program Files\VMware\VMware vCenter Site Recovery Manager\config and search for the tag <DBManager> you will notice the <dsn> name will be incorrect.

Upon providing in the right dsn name within the <dsn> </dsn> you will then notice a new back trace when you attempt to start the service again

VMware vCenter Site Recovery Manager application error.
class Vmacore::InvalidArgumentException "Invalid argument"
[backtrace begin] product: VMware vCenter Site Recovery Manager, version: 6.5.1, build: build-6014840, tag: vmware-dr, cpu: x86_64, os: windows, buildType: release
backtrace[00] vmacore.dll[0x001F29FA]
backtrace[01] vmacore.dll[0x00067EA0]
backtrace[02] vmacore.dll[0x0006A85E]
backtrace[03] vmacore.dll[0x00024064]
backtrace[04] listener.dll[0x0000BCBC]

What I suspect is something has gone wrong with the vmware-dr.xml file and the fix for this is to re-install the SRM application with an existing database. 

Post this, the service starts successfully. Hope this helps. 

Friday, 20 April 2018

Upgrading vCenter Appliance From 6.5 to 6.7

So as you know, vSphere 6.7 is now GA and this article will speak about upgrading an embedded PSC deployment of 6.5 vCenter appliance to 6.7. Once you download the 6.7 VCSA ISO installer
mount the ISO on a local windows machine and then you can use the ui installer for windows to begin the upgrade phase.

You will be presented with the below choices:


We will be going with the Upgrade option. The upgrade is going to be like the earlier path wherein the process will deploy a new 6.7 VCSA and perform a data and configuration migration from the older 6.5 appliance and then power down the old server when the upgrade is successful.


Accept the EULA to proceed further.


In the next step we will connect to the source appliance so provide in the IP/FQDN of the source 6.5 vCenter server.


Once the Connect To Source goes through you will then be asked to enter the SSO details and the ESX details where the 6.5 vCSA is running.


Then the next step is to provide information about the target appliance, the 6.7 appliance. You will select the ESX where the target appliance should be deployed.


Then provide the inventory display name for the target vCenter 6.7 along with the a root password.


Select the appliance deployment size for the target server. Make sure this matches or is greater than the source 6.7 server.


Then select the datastore where the target appliance should reside.


Next, we will provide a set of temporary network details for the 6.7 appliance. The appliance will inherit the old 6.5 network configuration post a successful migration.


Review the details and Finish the begin the Stage 1 deployment process.


Once the Stage 1 is done, you can Continue to proceed further with the Stage 2



In the Stage 2 we will be performing a data copy from the source vCenter appliance to the deployed target from Stage 1


Provide in the details to connect to the source vCenter server.


Select the type of data to be copied over to the destination vCenter server. In my case, I just want to migrate the configuration data.


Join the CEIP and proceed further


Review the details and Finish to begin the data copy.


The source vCenter will be shutdown post the data copy.


The data migration will take a while to complete and is in 3 stages.


And that's it. If all goes well, the migration is complete and you can access your new vCenter from the URL.

Hope this helps.

Thursday, 12 April 2018

SRM Test Recovery Fails: "Failed to create snapshots of replica devices"

When using SRM with array based replication, a test recovery operation will take a snapshot of the replica LUN, present it and mount it on the ESX server to bring up the VMs on an isolated network. 

In many instances, the test recovery would fail at the crucial step, which is taking a snapshot of the replica device. The GUI would mention: 

Failed to create snapshots of replica devices 

In this case, always look into the vmware-dr.log on the recovery site of the SRM. In my case I noticed the below snippet:

2018-04-10T11:00:12.287+01:00 error vmware-dr[16896] [Originator@6876 sub=SraCommand opID=7dd8a324:9075:7d02:758d] testFailoverStart's stderr:
--> java.io.IOException: Couldn't get lock for /tmp/santorini.log
--> at java.util.logging.FileHandler.openFiles(Unknown Source)
--> at java.util.logging.FileHandler.<init>(Unknown Source)
=================BREAK========================
--> Apr 10, 2018 11:00:12 AM com.emc.santorini.log.KLogger logWithException
--> WARNING: Unknown error: 
--> com.sun.xml.internal.ws.client.ClientTransportException: HTTP transport error: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
--> at com.sun.xml.internal.ws.transport.http.client.HttpClientTransport.getOutput(Unknown Source)
--> at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.process(Unknown Source)
--> at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.processRequest(Unknown Source)
--> at com.sun.xml.internal.ws.transport.DeferredTransportPipe.processRequest(Unknown Source)
--> at com.sun.xml.internal.ws.api.pipe.Fiber.__doRun(Unknown Source)
=================BREAK========================
--> Caused by: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
--> at sun.security.ssl.InputRecord.handleUnknownRecord(Unknown Source)
--> at sun.security.ssl.InputRecord.read(Unknown Source)
--> at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)

2018-04-10T11:00:12.299+01:00 error vmware-dr[21512] [Originator@6876 sub=AbrRecoveryEngine opID=7dd8a324:9075:7d02:758d] Dr::Providers::Abr::AbrRecoveryEngine::Internal::RecoverOp::ProcessFailoverFailure: Failed to create snapshots of replica devices for group 'vm-protection-group-45026' using array pair 'array-pair-2038': (dr.storage.fault.CommandFailed) {
-->    faultCause = (dr.storage.fault.LocalizableAdapterFault) {
-->       faultCause = (vmodl.MethodFault) null, 
-->       faultMessage = <unset>, 
-->       code = "78814f38-52ff-32a5-806c-73000467afca.1049", 
-->       arg = <unset>
-->       msg = ""
-->    }, 
-->    faultMessage = <unset>, 
-->    commandName = "testFailoverStart"
-->    msg = ""
--> }
--> [context]

So here the SRA attempts to establish connection with the RecoverPoint over HTTP which from 3.5.x is disabled. And we need to allow RP and SRM to communicate over HTTPS. 

On the SRM, perform the below:

1. Open CMD in admin mode and navigate to the below location:
c:\Program Files\VMware\VMware vCenter Site Recovery Manager\storage\sra\array-type-recoverpoint

2. Then run the below command:
"c:\Program Files\VMware\VMware vCenter Site Recovery Manager\external\perl-5.14.2\bin\perl.exe" command.pl --useHttps true

In 6.5 I have seen the path to be external\perl\perl]bin\perl.exe 
So verify what the correct path is for the second command. 

You should ideally see an output like:
Successfully changed to HTTPS security mode

3. Perform this on both the SRM sites. 

On the RPA, perform the below:

1. Login to each RPA with boxmgmt account

2. [2] Setup > [8] Advanced Options > [7] Security Options > [1] Change Web Server Mode 
(option number may change)

3. You will be then presented with this message:
Do you want to disable the HTTP server (y/n)?

4. Disable HTTP and repeat this on production and recovery RPA cluster. 

Restart the SRM service on both sites and re-run the test recovery and this should now complete successfully. 

Hope this helps. 

Wednesday, 4 April 2018

Maintenance Task Fails On VDP When Connected To Data Domain

There are many instances where the maintenance task fails on VDP. This article is in specific to VDP when integrated with data domain and moreover when the DDoS version is 6.1 and above.

The checkpoint and HFS tasks were completing fine without issues:
# dumpmaintlogs --types=cp | grep "<4"

2018/03/19-12:01:04.44235 {0.0} <4301> completed checkpoint maintenance
2018/03/19-12:04:17.71935 {0.0} <4300> starting scheduled checkpoint maintenance
2018/03/19-12:04:40.40012 {0.0} <4301> completed checkpoint maintenance

# dumpmaintlogs --types=hfscheck | grep "<4"

2018/03/18-12:00:59.49574 {0.0} <4002> starting scheduled hfscheck
2018/03/18-12:04:11.83316 {0.0} <4003> completed hfscheck of cp.20180318120037
2018/03/19-12:01:04.49357 {0.0} <4002> starting scheduled hfscheck
2018/03/19-12:04:16.59187 {0.0} <4003> completed hfscheck of cp.20180319120042

Garbage collection task was the one that was failing:
# dumpmaintlogs --types=gc --days=30 | grep "<4"

2018/03/18-12:00:22.29852 {0.0} <4200> starting scheduled garbage collection
2018/03/18-12:00:36.77421 {0.0} <4202> failed garbage collection with error MSG_ERR_DDR_ERROR
2018/03/19-12:00:23.91138 {0.0} <4200> starting scheduled garbage collection
2018/03/19-12:00:41.77701 {0.0} <4202> failed garbage collection with error MSG_ERR_DDR_ERROR

From ddrmaint.log located under /usr/local/avamar/var/ddrmaintlogs had the following entry:

Mar 18 12:00:31 VDP01 ddrmaint.bin[14667]: Error: gc-finish::remove_unwanted_checkpoints: Failed to retrieve snapshot checkpoints: LSU: avamar-1488469814 ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error

Mar 18 12:00:34 VDP01 ddrmaint.bin[14667]: Info: gc-finish:[phase 4] Completed garbage collection for data-domain.home.local(1), DDR result code: 0, desc: Error not set

Mar 19 12:00:35 VDP01 ddrmaint.bin[13409]: Error: gc-finish::remove_unwanted_checkpoints: Failed to retrieve snapshot checkpoints: LSU: avamar-1488469814 ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error

Mar 19 12:00:39 VDP01 ddrmaint.bin[13409]: Info: gc-finish:[phase 4] Completed garbage collection for data-domain.home.local(1), DDR result code: 0, desc: Error not set

It was basically failing to retrieve checkpoint list from the data domain.
Also, the get checkpoint list was failing:

Mar 20 11:16:50 VDP01 ddrmaint.bin[27852]: Error: cplist::body - auto checkpoint list failed result code: 0

Mar 20 11:16:50 VDP01 ddrmaint.bin[27852]: Error: <4750>Datadomain get checkpoint list operation failed.

Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: cplist::execute_cplist: Failed to retrieve snapshot checkpoints from LSU: avamar-1488469814, ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error

Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: cplist::body - auto checkpoint list failed result code: 0

Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: <4750>Datadomain get checkpoint list operation failed.

From the mTree LSU of this VDP Server, we noticed that the checkpoints were not expired:
# snapshot list mtree /data/col1/avamar-1488469814

Snapshot Information for MTree: /data/col1/avamar-1488469814
----------------------------------------------
Name                Pre-Comp (GiB)   Create Date         Retain Until   Status
-----------------   --------------   -----------------   ------------   ------
cp.20171220090039         128533.9   Dec 20 2017 09:00
cp.20171220090418         128543.0   Dec 20 2017 09:04
cp.20171221090040         131703.8   Dec 21 2017 09:00
cp.20171221090415         131712.9   Dec 21 2017 09:04
.
cp.20180318120414         161983.7   Mar 18 2018 12:04
cp.20180319120042         162263.9   Mar 19 2018 12:01
cp.20180319120418         162273.7   Mar 19 2018 12:04
cur.1515764908            125477.9   Jan 12 2018 13:49
-----------------   --------------   -----------------   ------------   ------
Snapshot Summary
-------------------
Total:          177
Not expired:    177
Expired:          0

Due to this, all the recent checkpoints on VDP were invalid:
# cplist

cp.20180228120038 Wed Feb 28 12:00:38 2018 invalid --- ---  nodes   1/1 stripes     76
.
cp.20180318120414 Sun Mar 18 12:04:14 2018 invalid --- ---  nodes   1/1 stripes     76
cp.20180319120042 Mon Mar 19 12:00:42 2018 invalid --- ---  nodes   1/1 stripes     76
cp.20180319120418 Mon Mar 19 12:04:18 2018 invalid --- ---  nodes   1/1 stripes     76

The case here is the VDP version was 6.1.x and the data domain OS version was 6.1
# ddrmaint read-ddr-info --format=full

====================== Read-DDR-Info ======================
System name        : xxx.xxxx.xxxx
System ID          : Bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx4
DDBoost user       : ddboost
System index       : 1
Replication        : True
CP Backup          : True
Model number       : DDxxx
Serialno           : Cxxxxxxxx
DDOS version       : 6.1.0.21-579789
System attached    : 1970-01-01 00:00:00 (0)
System max streams : 16

6.1 DD OS version is not supported for VDP 6.1.x. 6.0.x is the last DD OS version supported for VDP.

So if your DD OS is on 6.1.x then the choice would be to:
> Migrate the VDP to Avamar Virtual Edition (Recommended)
> Rollback DD OS to 6.0.x

Hope this helps!

Tuesday, 27 March 2018

Embedded Replication Server Disconnected In vSphere Replication 5.8

A vSphere replication server comes with an embedded replication service to manage all the traffic and vR queries in addition to an option of deploying add on servers. In 5.8 or older vSphere replication servers, there are scenarios where this embedded replication server is displayed as disconnected. Since this embedded service is disconnected, the replications will be in RPO violation state as the replication traffic is not manageable.

In the hbrsrv.log on the vSphere replication appliance, located under /var/log/vmware, we see the below:

repl:/var/log/vmware # grep -i "link-local" hbrsrv*

hbrsrv-402.log:2018-03-23T11:25:24.914Z [7F70AC62E720 info 'HostCreds' opID=hs-init-1d08f1ab] Ignoring link-local address for host-50: "fe80::be30:5bff:fed9:7c52"

hbrsrv.log:2018-03-23T11:25:24.914Z [7F70AC62E720 info 'HostCreds' opID=hs-init-1d08f1ab] Ignoring link-local address for host-50: "fe80::be30:5bff:fed9:7c52"

So, this is seen when the VMs being replicated are on an ESX host which has IPv6 link local address enabled and the host is using an IPv4 addressing. 

The logs, here speak in terms on host MoID, so you can find out the host name from the vCenter MOB page, https://<vcenter-ip/mob

To navigate to the host MoID section:

Content > group-d1 (Datacenters) > (Your datacenter) under childEntity > group-xx under hostFolder > domain-xx (Under childEntity) > locate the host ID

Then using this hostname, disable the IPv6 on the referenced ESX:
> Select the ESXi 
> Select Configuration
> Select Networking
> Edit Settings for vmk0 (Management) port group
> IP Address, Un-check IPv6

Then reboot that ESX host. Repeat the steps for the remaining ESX too and then finally reboot the vSphere Replication Appliance. 

Now, there should no longer be link-local logging in hbrsrv.log and the embedded server should be connected allowing the RPO syncs to resume.

Hope this helps!

Friday, 16 March 2018

Unable To Make Changes On A Virtual Machine - The operation is not allowed in the current state of the datastore

Recently, while working on a case we noticed that we were unable to make changes to any virtual machine on a particular NetApp NFS datastore. We were unable to add disks, increase existing VMDKs or create virtual machines on that datastore.

The error we received was:



When we login to the host directly via UI client then we were able to perform all the above mentioned changes. This pointed out that, there seemed to be an issue with the vCenter server and not the datastore.

So looking into the vpxd.log for vCenter, this is what we saw:

2018-03-16T09:11:48.485Z info vpxd[7FB8F56ED700] [Originator@6876 sub=Default opID=VmConfigFormMediator-applyOnMultiEntity-93953-ngc:70007296-f3] [VpxLRO] -- ERROR task-252339 -- vm-44240 -- vim.VirtualMachine.reconfigure: vim.fault.InvalidDatastoreState:
--> Result:
--> (vim.fault.InvalidDatastoreState) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>,
-->    datastoreName = "dc1_vmware01"
-->    msg = ""
--> }
--> Args:
-->
--> Arg spec:
--> (vim.vm.ConfigSpec) {
-->    changeVersion = "2018-03-15T08:04:06.266218Z",
-->    name = <unset>,

To fix this we had to change the thin_prov_space_flag from 1 to 0 on the vCenter server database. In my case, the vCenter was an appliance (The process remains more or less same for Windows based vCenter as well) 

The fix:

1. Always have a snapshot of the vCenter server before making any changes within it.

2. Stop the vCenter server service using:
# service-control --stop vmware-vpxd

3. Connect to the vCenter database using:
# /opt/vmware/vpostgres/current/bin/psql -d VCDB -U vc

The password for the vCenter DB can be found in the below file:
/etc/vmware-vpx/vcdb.properties file

Run this below query to list out the datastores with this vCenter:
select * from vpx_datastore where name='<enter-your-datastore-name>';

The output would be something like:
   11 | dc1_vmware01        | ds:///vmfs/volumes/7a567de9-3e3c0969/                   | 5277655814144 | 1745005346816 | NFS  |      | <obj xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns="urn:vim25" versionId="6.5" xsi:type="DatastoreCapability"><directoryHierarchySupported>true</directoryHierarchySupported><rawDiskMappingsSupported>false</rawDiskMappingsSupported><perFileThinProvisioningSupported>true</perFileThi
nProvisioningSupported><storageIORMSupported>true</storageIORMSupported><nativeSnapshotSupported>true</nativeSnapshotSupported><topLevelDirectoryCreateSupported>true</topLevelDirectoryCreateSupported><seSparseSupported>true</seSparseSupp
orted><vmfsSparseSupported>true</vmfsSparseSupported><vsanSparseSupported>false</vsanSparseSupported></obj> |             2 |            0 |                        30 |                0 |                    1 | automatic
  |                        90 |                  1

Too much information here, so we can filter it out using the below query
select id,thin_prov_space_flag from vpx_datastore;

Now you can see:

VCDB=> select id,thin_prov_space_flag from vpx_datastore;
  id  | thin_prov_space_flag
------+----------------------
 5177 |                    0
 6449 |                    0
 5178 |                    0
   12 |                    0
  795 |                    0
  149 |                    0
  793 |                    0
   11 |                    1

Now, we need to change the thin_prov_space_flag from 1 to 0 for id=11

So run this query:
update vpx_datastore set thin_prov_space_flag=0 where id=<enter-your-id>;

Quit the database view using \q

Start the web client service using:
# service-control --start vmware-vpxd

Re-login back to the vCenter and now you should be able to make the necessary changes. 

Hope this helps!

Tuesday, 13 March 2018

Creating A VMFS Volume From Command Line

One of the alternate methods to formatting a new VMFS volume from the GUI is to create the same from the SSH of the ESXi host.

The process is quite simple and you can follow them as mentioned below:

1. Make sure the device is presented to the ESX and visible. If not, perform a Rescan Storage and check if the device is visible.

2. You can get the device identifier from the SSH of the ESX by navigating to:
# cd /vmfs/devices/disks

In my case, the device I was interested was mpx.vmhba1:C0:T3:L0

3. Next, we need to create a partition on this device and we no longer use fdisk for ESX as this is deprecated. So we will use partedUtil

So, we will create a partition (Number=1) at an offset of 128. The partition identifier is 0xfb which is a VMFS partition. 0xfb = 251. Along with this we will specify the ending sector.

To calculate ending sector:
The disk has 512 bytes per sector. In my case the device is 12 GB.
So number of bytes is 12884901887.99998
Dividing this by 512 is 25165824 sectors.

Do not use the complete sector value as it might complain out of bound sector value, so use one number less.

The command would then be:
# partedUtil set /vmfs/devices/disks/device-name "1 128 <ending-sector> 251 0"

Sample command:
# partedUtil set /vmfs/devices/disks/mpx.vmhba1:C0:T3:L0 "1 128 25165823 251 0"

A successful output would be:
0 0 0 0
1 128 25165823 251 0


4. Next, you format a VMFS volume using the vmkfstools -C command. 

The command would be:
# vmkfstools -C <vmfs-version> -b <block-size> -S <name-of-datastore> /vmfs/devices/disks/<device-name>:<partition-number> 

So the command for me would be (For a VMFS5 partition with 1 mb block size)
# vmkfstools -C vmfs5 -b 1m -S Test /vmfs/devices/disks/mpx.vmhba1:C0:T3:L0:1

A successful output would be:
Checking if remote hosts are using this device as a valid file system. This may take a few seconds...
Creating vmfs5 file system on "mpx.vmhba1:C0:T3:L0:1" with blockSize 1048576 and volume label "Test".
Successfully created new volume: 5aa7d4e8-1e99a608-f609-000c292cd901

Now, back in the GUI just do a refresh on the storage section and this volume is visible for the host. 

Hope this helps!

Friday, 2 March 2018

SRM CentOS 7.4 IP Customization Fails

If you are using SRM 6.0.x or SRM 6.1.x and you are trying to test failover a CentOS 7.4 machine with IP Customization the Customize IP Section of the recovery fails with the message

The guest operating system '' is not supported


In the vmware-dr.log on the DR site SRM, you will notice the following:

2018-03-02T02:10:43.405Z [01032 error 'Recovery' ctxID=345cedf opID=72d8d85a] Plan 'CentOS74' failed: (vim.fault.UnsupportedGuest) {
-->    faultCause = (vmodl.MethodFault) null, 
-->    property = "guest.guestId", 
-->    unsupportedGuestOS = "", 
-->    msg = ""
--> }

This is because the CentOS7.4 is not a part of supported guest in the imgcust binaries of the 6.0 release. For CentOS 7.4 customization to work, the SRM needs to be on a 6.5 release. In my case, I upgraded vCenter to 6.5 Update 1 and SRM to 6.5.1 post which the test recovery completed without issues.

If there is no plan for immediate upgrade of your environment, but would still like to have the customizations completing, then use this workaround.

If you look at the redhat-release file
# cat /etc/redhat-release

The contents are:
CentOS Linux release 7.4.1708 (Core)

So you remove this and then add:
Red Hat Enterprise Linux Server release 7.0 (Maipo)

Since RHEL 7.0 is supported in imgcust for 6.0 the test recovery completes fine. Hope this helps!

Friday, 23 February 2018

VDP Backup Fails With Time Out - End

So you might be aware that a backup job in VDP can run upto 24 hours, post which it ideally times out with Time Out end status. If this is what you are facing, then you can extend the backup window overtime using this KB article here

In some cases, this might not be the situation. You might not hit the 24 hour window but still the backup fails with Time Out - End

ID Status Error Code Start Time Elapsed End Time Type Progress Bytes New Bytes Client Domain
9151917120002509 Timed Out - End 0 2018-02-21 00:01 IST 02h:59m:31s 2018-02-21 03:00 GMT Scheduled Backup 179.9 GB 15.4% Test-VM /vcenter.southpark.local/VirtualMachines

The backup logs will have the same logging that one would expect to see when the time out issue occurs.

The avtar.log will have something similar to:

2018-02-21T03:00:23.253-01:00 avtar Info <7061>: Canceled by '3016-vmimagew' - exiting...
2018-02-21T03:00:23.258-01:00 avtar Info <9772>: Starting graceful (staged) termination, cancel request (wrap-up stage)
2018-02-21T03:00:23.347-01:00 [avtar] ERROR: <0001> backstreamdir::childdone error merging history stream data during phase_do_hidden while processing directory 'VMFiles/2'
2018-02-21T03:00:23.392-01:00 [avtar] ERROR: <0001> backstreamdir::childdone error merging history stream data during phase_do_hidden while processing directory 'VMFiles'
2018-02-21T03:00:23.438-01:00 avtar Info <7202>: Backup CANCELED, wrapping-up session with Server

The backup.log will have:

2018-02-21T03:00:23.169-01:00 avvcbimage Info <9740>: skipping 2048 unchanged sectors in PAX stream at sector offset 241468416
2018-02-21T03:00:23.196-01:00 avvcbimage Info <9772>: Starting graceful (staged) termination, MCS cancel (wrap-up stage)
2018-02-21T03:00:23.212-01:00 avvcbimage Info <19692>: Cancelled by MCS with timeout=0 sec
2018-02-21T03:00:23.647-01:00 avvcbimage Info <40654>: isExitOK()=169
2018-02-21T03:00:23.647-01:00 avvcbimage Info <16022>: Cancel detected(externally cancelled by Administrator), isExitOK(0).
2018-02-21T03:00:23.657-01:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Close: Close disk.


The avagent.log will have the following:

2018-02-21T03:00:23.137-01:00 avagent Info <5964>: Requesting work from 127.0.0.1
2018-02-21T03:00:23.162-01:00 avagent Info <8636>: Starting CTL cancel of workorder "BackupTest-BackupTest-1519171200009".
2018-02-21T03:00:31.454-01:00 avagent Info <6688>: Process 127319 (/usr/local/avamarclient/bin/avvcbimage) finished (code 169: externally cancelled by Administrator)
2018-02-21T03:00:31.454-01:00 avagent Warning <6690>: CTL workorder "BackupTest-BackupTest-1519171200009" non-zero exit status 'code 169: externally cancelled by Administrator'

When I checked the backup duration window, it was set only to 3 hours. Not sure how that changed

root@vdp:/usr/local/avamar/var/vdr/server_logs/#: mccli schedule show --name=/vcenter.southpark.local/VirtualMachines/BackupTest | grep "Backup Window Duration" 
Backup Window Duration 3 hours and 0 minutes

So we had to flip it back to 24 hours:

root@vdp:/usr/local/avamar/var/vdr/server_logs/#: mccli schedule edit --name=/vcenter.southpark.local/VirtualMachines/BackupTest --duration=24:00
0,22214,Schedule modified

root@vdp:/usr/local/avamar/var/vdr/server_logs/#: mccli schedule show --name=/vcenter.southpark.local/VirtualMachines/BackupTest | grep "Backup Window Duration" 
Backup Window Duration 24 hours and 0 minutes

The backup should not time out before the 24 hour window anymore. Hope this helps!