Tuesday, 1 May 2018

VDP Backup And Maintenance Tasks Fail With "DDR result code: 5004, desc: nothing matched"

I had come across a scenario where the backups on VDP appliance connected to a data domain used to fail constantly and the backup logs for the avtar service had the following snippet:

2018-04-24T20:01:16.180+07:00 avtar Info <19156>: - Establishing a connection to the Data Domain system with encryption (Connection mode: A:3 E:2).
2018-04-24T20:01:26.473+07:00 avtar Error <10542>: Data Domain server "data-domain.vcloud.local" open failed DDR result code: 5004, desc: nothing matched
2018-04-24T20:01:26.473+07:00 avtar Error <10509>: Problem logging into the DDR server:'', only GSAN communication was enabled.
2018-04-24T20:01:26.474+07:00 avtar FATAL <17964>: Backup is incomplete because file "/ddr_files.xml" is missing
2018-04-24T20:01:26.474+07:00 avtar Info <10642>: DDR errors caused the backup to not be posted, errors=0, fatals=0
2018-04-24T20:01:26.474+07:00 avtar Info <12530>: Backup was not committed to the DDR.
2018-04-24T20:01:26.475+07:00 avtar FATAL <8941>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
2018-04-24T20:01:26.475+07:00 avtar Info <6149>: Error summary: 4 errors: 10542, 8941, 10509, 17964
2018-04-24T20:01:26.476+07:00 avtar Info <8468>: Sending wrapup message to parent
2018-04-24T20:01:26.476+07:00 avtar Info <5314>: Command failed (4 errors, exit code 10008: cannot establish connection with server (possible network or DNS failure))

Whenever we have issues with a VDP system with data domain we usually have a look into the ddrmaintlogs located under /usr/local/avamar/var/ddrmaintlogs

Under this, I saw the following:

Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: Data Domain Engine (3.1.0.1 build 481386)
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: cplist::enumerate_ddrconfig ddr_info version is 5.
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: cplist::enumerate_ddrconfig found 1 ddrconfig records in ddr_info
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: cplist::open_all_ddrs: dpnid 1494515613 from flag
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: cplist::open_all_ddrs: LSU: avamar-1494515613
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: cplist::open_all_ddrs: server=data-domain.vcloud.local(1),id=4D752F9C9A984D026520ACA64AA465388352BAB1,user=vmware_vdp
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: Establishing a connection to the Data Domain system with basic authentication (Connection mode: A:0 E:0).
Apr 25 09:03:59 slc2pdvdp01 ddrmaint.bin[84192]: Error: cplist::open_ddr: DDR_Open failed: data-domain.vcloud.local(1) lsu: avamar-1494515613, DDR result code: 5004, desc: nothing matched
Apr 25 09:03:59 slc2pdvdp01 ddrmaint.bin[84192]: Error: cplist::body - ddr_info file from the persistent store has no ddr_config entries
Apr 25 09:03:59 slc2pdvdp01 ddrmaint.bin[84192]: Error: <4750>Datadomain get checkpoint list operation failed.
Apr 25 09:03:59 slc2pdvdp01 ddrmaint.bin[84192]: Info: ============================= cplist finished in 11 seconds
Apr 25 09:03:59 slc2pdvdp01 ddrmaint.bin[84192]: Info: ============================= cplist cmd finished =============================

Out of curiosity, when I try to list data domain checkpoints, it fails too.

root@vdp:~/#: ddrmaint cplist
<4750>Datadomain get checkpoint list operation failed.

The DDoS seemed to be supported for the VDP node:

root@vdp:/usr/local/avamar/var/ddrmaintlogs/#: ddrmaint read-ddr-info --format=full
====================== Read-DDR-Info ======================

 System name        : data-domain.vcloud.local
 System ID          : 4D752F9C9A984D026520ACB64AA465388352BAB1
 DDBoost user       : vmware_vdp
 System index       : 1
 Replication        : True
 CP Backup          : True
 Model number       : DD670
 Serialno           : 3FA0807274
 DDOS version       : 6.0.1.10-561375
 System attached    : 1970-01-01 00:00:00 (0)
 System max streams : 16

You can SSH into the Data Domain and view the ddfs logs using the below command:
# log view debug/ddfs.info

I noticed the following in this log:

04/25 12:08:57.141 (tid 0x7fbf4fa68490): exp_find_export: mount failed from (10.67.167.96/slc2pdvdp01.take.out): export point /backup/ost does not exist
04/25 12:08:57.141 (tid 0x7fbf4fa68490): nfsproc3_ost_mnt_3_svc: connection failed mount error 2 from plugin 10.67.167.96 version 3.1
04/25 12:08:58.147 (tid 0x6314fb0): nfs3_fm_lookup_by_path("/data/col1/backup/ost") failed: 5183
04/25 12:08:58.147 (tid 0x6314fb0): exp_find_export: mount failed from (10.67.167.96/slc2pdvdp01.take.out): export point /backup/ost does not exist
04/25 12:08:58.147 (tid 0x6314fb0): nfsproc3_ost_mnt_3_svc: connection failed mount error 2 from plugin 10.67.167.96 version 3.1
04/25 12:08:59.153 (tid 0x7fc24501bd40): nfs3_fm_lookup_by_path("/data/col1/backup/ost") failed: 5183
04/25 12:08:59.153 (tid 0x7fc24501bd40): exp_find_export: mount failed from (10.67.167.96/slc2pdvdp01.take.out): export point /backup/ost does not exist
04/25 12:08:59.153 (tid 0x7fc24501bd40): nfsproc3_ost_mnt_3_svc: connection failed mount error 2 from plugin 10.67.167.96 version 3.1

The VDP server relies upon the ost folder to perform the maintenance tasks and this folder was missing from the data domain causing maintenance and backup tasks to fail. 

To fix this we need to recreate the ost folder and export the NFS mount

1. Login to the Bash shell of Data Domain. You can view this article for the steps.

2. Navigate to the below directory:
# cd /data/col1/backup

3. Verify that there is no ost folder when you list for files and folders

4. Create the directory and set the ownership and permissions with the below commands:
# mkdir ost
# chmod 777 ost
# chown <your_ddboost_user> ost

5. Exit the bash shell with exit

6. Create the NFS mount with:
# nfs add /backup/ost *

You will see the below message:
NFS export for "/backup/ost" added.

Exit the data domain and back in VDP putty, run ddrmaint cplist and this should return the checkpoint lists successfully. This should proceed with a successful maintenance and backup tasks.

Hope this helps!