Wednesday, 30 November 2016

VDP 6.1: Unable To Expand Storage

So, there has been few tricky issues going on with expanding VDP storage drives. This section would talk about specifically about OS Kernel not picking up the partition extents.

A brief intro about what's going on here. So, you know in vSphere Data Protection 6.x onward the dedup storage drives can be expanded. If your backup data drives are running out of space and you do not wish to delete restore points, then this feature allows you to extend your data partitions. In this case, we will login into the https://vdp-ip:8543/vdp-configure page, go to the Storage tab and select the Expand Storage option. The wizard successfully expands the existing partitions. Post this, if you run df -h from the SSH of the VDP, it should pick up the expanded information. In this case, either none of the partitions are expanded or few of them report inconsistent information. 

So, in my case, I had a 512 GB of VDP deployment, which by default deploys 3 drives of ~256 GB each. 

Post this, I expanded the storage to 1 TB. Which would ideally have 3 drives of ~512 GB each. In my case the expansion in wizard completed successfully, however, the data drives were inconsistent when viewed from command line. In the GUI, Edit Settings of the VM, the correct information was displayed.



When I ran df -h, the below was seen:

root@vdp58:~/#: df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        32G  5.8G   25G  20% /
udev            1.9G  152K  1.9G   1% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
/dev/sda1       128M   37M   85M  31% /boot
/dev/sda7       1.5G  167M  1.3G  12% /var
/dev/sda9       138G  7.2G  124G   6% /space
/dev/sdb1       256G  2.4G  254G   1% /data01
/dev/sdc1       512G  334M  512G   1% /data02
/dev/sdd1       512G  286M  512G   1% /data03

The sdb1 was not expanded to 512 GB whereas the data partitions sdc1 and sdd1 were successfully extended. 

If I run fdisk -l then I see the partitions have been extended successfully for all the 3 data0? mounts with the updated space. 

**If you run the fdisk -l command and do not see the partitions updated, then raise a case with VMware**

Disk /dev/sdb: 549.8 GB, 549755813888 bytes
255 heads, 63 sectors/track, 66837 cylinders, total 1073741824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1  1073736404   536868202   83  Linux

Disk /dev/sdc: 549.8 GB, 549755813888 bytes
255 heads, 63 sectors/track, 66837 cylinders, total 1073741824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1  1073736404   536868202   83  Linux

Disk /dev/sdd: 549.8 GB, 549755813888 bytes
255 heads, 63 sectors/track, 66837 cylinders, total 1073741824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1  1073736404   536868202   83  Linux

If this is the case, run partprobe command. This makes the SUSE Kernel aware of the partition table changes. Post this, run a df -h to verify if the data drives are now updated with the correct size. If yes, then stop here. If not, then proceed further.

**Make sure you do this with a help of VMware engineer if this is a production environment**

If the partprobe does not work, then we will have to grow the xfs volume. To do this:

1. Power down the VDP appliance gracefully
2. Change the data drives from Independent Persistent to Dependent
3. Take a snapshot of the VDP appliance
4. Power On the VDP appliance
5. Once the appliance is booted successfully, stop all the services using the command:
# dpnctl stop
6. Grow the mount point using the command:
# xfs_growfs <mount point>
In my case:
# xfs_growfs /dev/sdb1
If successful, you will see the below output: (Ignore the formatting)

root@vdp58:~/#: xfs_growfs /dev/sdb1
meta-data=/dev/sdb1              isize=256    agcount=4, agsize=16776881 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=67107521, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=32767, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 67107521 to 134217050

Run df -h and verify if the partitions are now updated.

root@vdp58:~/#: df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        32G  5.8G   25G  20% /
udev            1.9G  148K  1.9G   1% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
/dev/sda1       128M   37M   85M  31% /boot
/dev/sda7       1.5G  167M  1.3G  12% /var
/dev/sda9       138G  7.1G  124G   6% /space
/dev/sdb1       512G  2.4G  510G   1% /data01
/dev/sdc1       512G  334M  512G   1% /data02

/dev/sdd1       512G  286M  512G   1% /data03

If yes, then stop here.
If not, then raise a support with VMware, as this would go for engineering fix.

Hope this helps. 

Saturday, 19 November 2016

VDP SQL Agent Backup Fails: Operating System Error 0x8007000e

While the SQL agent was configured successfully for the SQL box, the backups were failing continuously for this virtual machine. Upon viewing the backup logs, located at C:\Program Files\avp\var\, the following error logging was observed.

2016-11-13 21:00:27 avsql Error <40258>: sqlconnectimpl_smo::execute Microsoft.SqlServer.Management.Common.ExecutionFailureException: An exception occurred while executing a Transact-SQL statement or batch. ---> System.Data.SqlClient.SqlException: BackupVirtualDeviceSet::SetBufferParms: Request large buffers failure on backup device '(local)_SQL-DB-1479099600040-3006-SQL'. Operating system error 0x8007000e(Not enough storage is available to complete this operation.).

BACKUP DATABASE is terminating abnormally.

   at Microsoft.SqlServer.Management.Common.ConnectionManager.ExecuteTSql(ExecuteTSqlAction action, Object execObject, DataSet fillDataSet, Boolean catchException)

   at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteWithResults(String sqlCommand)

   --- End of inner exception stack trace ---

   at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteWithResults(String sqlCommand)

   at SMOWrap.SMO_ExecuteWithResults(SMOWrap* , UInt16* sql_cmd, vector<std::basic_string<unsigned short\,std::char_traits<unsigned short>\,std::allocator<unsigned short> >\,std::allocator<std::basic_string<unsigned short\,std::char_traits<unsigned short>\,std::allocator<unsigned short> > > >* messages)

This is caused due to SQL VDI timeouts. The default timeout for VDI transfer is 10 seconds. 

To fix this:

1. Browse to the below directory:
C:\Program Files\avp\var

2. Add the below two lines to the avsql.cmd file. In my case, this file was unavailable so I had to create it as a text file and save it as a cmd file using Save All Types:

--max-transfer-size=65536
--vditransfertimeoutsecs=900

If the SQL virtual machine is a pretty huge deployment, then it would be necessary to increase the vditransfertimeoutsecs parameter. 

3. Run the backup again and it should complete successfully this time. 

Hope this helps.

VDP 6.1.3 First Look

So over the last week, there has been multiple products from VMware going live, and the most awaited one was the vSphere 6.5.
With vSphere 6.5, the add-on VMware products have to be on their compatible versions. This one is specifically dedicated to vSphere Data Protection 6.1.3. I will try to keep this post short, and mention the changes I have seen post deploying this in my lab. There is already a release notes for this, which covers most of the fixed issues and known issues in the 6.1.3 release.

This article speaks only about the issues that is not included in the release notes.

Issue 1: 

While using internal proxy for this appliance, the vSphere Data Protection configure page comes up blank for Name of the proxy, ESX Host where the proxy is deployed and the Datastore where the proxy resides.

The vdr-configure log does not have any information on this and a reconfigure / disable and re-enable of internal proxy does not fix this.



Workaround:
A reboot of appliance populates the right information back in the configure page.

Issue 2:

This is an intermittent issue and seen only during fresh deployment of the 6.1.3 appliance. The vSphere Data Protection plugin is not available in the web client. The VDP plugin version for this release is com.vmware.vdp2-6.1.3.70, and this folder is not created in the vsphere-client-serenity folder in the vCenter Server. The fix is similar to this link here.

Issue 3:

vDS Port-Groups are not listed during deployment of external proxies. The drop-down shows only standard switch port-groups.

Workaround:
Deploy the external proxy on a standard switch and then migrate it to a distributed switch. The standard switch port group you create does not need any up-links as this will be migrated to vDS soon after the deployment is completed.

Issue 4:

VM which is added to a backup job comes as unprotected after a rename on the VM is done. VDP does not sync it's naming with vCenter inventory automatically.

Workaround:
Force sync the vCenter - Avamar Names using the proxycp.jar utility. The steps can be found in this article here.

Issue 5:

Viewing logs for a failed backup from Job Failure tab does not return anything. The below message is seen while trying to view the logs:


This was seen in all version of 6.x even if none of the mentioned reasons are true. EMC has acknowledged this bug, however there is no fix for it currently.

Workaround:
View logs from Task Failure tab, command line or gather logs from vdp-configure page.

Issue 5:
Backups fail when the environment is running ESXi 5.1 with VDP 6.1.3
GUI Error is somewhat similar to: Failed To Attach Disk

The jobs fail with:

2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:2016-11-29T16:03:04.600+01:00 error -[7FF94BA98700] [Originator@6876 sub=Default] No path to device LVID:56bc8e94-884c9a71-4dee-f01fafd0a2c8/56bc8e94-61c7c985-1e2d-f01fafd0a2c8/1 found.
2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:2016-11-29T16:03:04.600+01:00 error -[7FF94BA98700] [Originator@6876 sub=Default] Failed to open new LUN LVID:56bc8e94-884c9a71-4dee-f01fafd0a2c8/56bc8e94-61c7c985-1e2d-f01fafd0a2c8/1.
2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:-->
2016-11-29T16:03:05.096-02:00 avvcbimage Info <16041>: VDDK:VixDiskLib: Unable to locate appropriate transport mode to open disk. Error 13 (You do not have access rights to this file) at 4843.

There is no fix for this. Suggest you to upgrade your ESXi to 5.5 as there are compatibility issues with 5.1


Fixed Issue: Controlling concurrent backup jobs. (Not contained in Release Notes)

In VDP releases prior to 6.1.3 and after 6.0.x, the vdp-configure page provided an option to control how many backup jobs should run at a time. The throttling was set under the "Manage Proxy Throughput" section. However, this never worked for most deployments.

This is fixed in 6.1.3

The test:

Created a backup job with 5 VMs.
Set the throughput to 1. 5 iterations of backup were executed - Check Passed
Set the throughput to 2. 3 iterations of backup were executed - Check Passed

Set 2 external proxies and throughput to 1
The throughput would be 2 x 1 = 2
3 Iterations of backups were executed - Check Passed.

Re-registered the appliance. Same test - Check Passed
vMotion the appliance. Same test - Check Passed Reboot the
VDP. Same test - Check Passed


I will update this article as and when I come across new issues / fixes that is NOT included in the vSphere Data Protection 6.1.3 release notes.

Friday, 18 November 2016

VDP Backups Stuck In "Waiting-Client" State

I was recently working on a case where VDP 5.5.6 never started its backup jobs. When I select the backup job and select Backup Now, it shows the job has been started successfully, however, there is no task created at all for this.

And, when I login to SSH of the VDP appliance to view the progress of the backup the state is in "Waiting-Client". So, in SSH the below command is executed to view backup status:
# mccli activity show --active

The output:

ID               Status            Error Code Start Time           Elapsed     End Time             Type             Progress Bytes New Bytes Client   Domain
---------------- ----------------- ---------- -------------------- ----------- -------------------- ---------------- -------------- --------- -------- --------------------------
9147944954795109 Waiting-Client    0          2016-11-18 07:12 IST 07h:40m:23s 2016-11-19 07:12 IST On-Demand Backup 0 bytes        0%        VM-1 /10.0.0.27/VirtualMachines
9147940920005609 Waiting-Client    0          2016-11-17 20:00 IST 18h:52m:51s 2016-11-18 20:00 IST Scheduled Backup 0 bytes        0%        VM-2 /10.0.0.27/VirtualMachines

The backups were always stuck in this status and never moved further. If I look at the vdr-server.log, I do see the job has been issued to MCS:

2016-11-18 14:33:06,630 INFO  [http-nio-8543-exec-10]-rest.BackupService: Executing Backup Job "Job-A"

However, If I look at the MCS logs, the mcserver.log, then I see the Job is not executed by MCS as MCS thinks that the server is in read-only state:

WARNING: Backup job skipped because server is read-only

If I run status.dpn, I see the Server Is In Full Access state. I checked the dispatcher status using the below command:

# mcserver.sh --status

You will have to be in admin mode to run the mcserver.sh script. The output of this script was:

Backup dispatching: suspended

This is a known issue on the 5.5.6 release of VDP. 

To fix this:

1. Cancel any existing backup jobs using the command:
# mccli acitivity cancel --id=<job-id>

The Job ID is the first section in the above mccli activity show command.

2. Browse the below location:
# cd /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

3. Open the mcserver.xml file in a vi editor.

4. Locate the parameter "stripeUtilizationCapacityFactor" and edit the value to 2.0.
**Do not change anything else in this file at all. Just the value needs to be changed**

5. Save the file and restart the MCS using the below command:
# dpnctl stop mcs
# dpnctl start mcs

6. Run the mcserver.sh to check the dispatcher status:
# mcserver.sh --status | grep -i dispatching

This time the output should be:

Backup dispatching: running

7. Run the backup again and it should start it immediately now.

If this does not work, raise a support request with VMware. Hope this helps.

Thursday, 17 November 2016

Dude, Where's My Logs?

Well, what do we have here? Let's say, we created one backup Job in vSphere Data Protection Appliance, and we added 30 virtual machines to it. Next, we started the backup for this job, and you let it run overnight. Next morning when you login, you see that 29 VMs completed successfully and 1 VM failed for backup. 


You login to the SSH of the VDP and browse to /usr/local/avamarclient/var and notice that there are 30 logs for the same backup Job with different IDs which makes no sense. You don't know which backup log this failed VM is contained in. 


Here's a sneaky way to get this done.

I have created 3 VMs: Backup-Test, Backup-Test-B and Backup-Test-C
I have a backup Job called Job-A

I initiated a manual backup and all 3 VMs completed successfully. If I go to the backup job logs folder, I see the following:

-rw-r----- 1 root root 1.7K Nov 18 04:33 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel-xmlstats.log
-rw-r----- 1 root root  15K Nov 18 04:34 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel.alg
-rw-r----- 1 root root  40K Nov 18 04:34 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel.log
-rw-r----- 1 root root  18K Nov 18 04:33 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel_avtar.log
-rw-r----- 1 root root  383 Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew-xmlstats.log
-rw-r----- 1 root root  15K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew.alg
-rw-r----- 1 root root  40K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew.log
-rw-r----- 1 root root  19K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew_avtar.log
-rw-r----- 1 root root 1.7K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel-xmlstats.log
-rw-r----- 1 root root  15K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.alg
-rw-r----- 1 root root  41K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.log
-rw-r----- 1 root root  18K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel_avtar.log

Looking at this, neither you or I have any idea on which log here contains logging for Backup-Test-C.

Step 1 of the sneaky trick:

Run the following command:
# mccli activity show --verbose=true

The output:

0,23000,CLI command completed successfully.
ID               Status                   Error Code Start Time           Elapsed     End Time             Type             Progress Bytes New Bytes Client        Domain                               OS            Client Release Sched. Start Time    Sched. End Time      Elapsed Wait Group                                      Plug-In              Retention Policy Retention Schedule                 Dataset                                    WID                 Server Container
---------------- ------------------------ ---------- -------------------- ----------- -------------------- ---------------- -------------- --------- ------------- ------------------------------------ ------------- -------------- -------------------- -------------------- ------------ ------------------------------------------ -------------------- ---------------- --------- ------------------------ ------------------------------------------ ------------------- ------ ---------
9147942378903409 Completed w/Exception(s) 10020      2016-11-18 04:33 IST 00h:00m:22s 2016-11-18 04:33 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test   /vc65.happycow.local/VirtualMachines windows9Guest 7.2.180-118    2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:11s  /vc65.happycow.local/VirtualMachines/Job-A Windows VMware Image Job-A            D         Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

9147942378902609 Completed                0          2016-11-18 04:33 IST 00h:00m:21s 2016-11-18 04:33 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test-C /vc65.happycow.local/VirtualMachines rhel7_64Guest 7.2.180-118    2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:21s  /vc65.happycow.local/VirtualMachines/Job-A Linux VMware Image   Job-A            DWMY      Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

9147942357262909 Completed w/Exception(s) 10020      2016-11-18 04:29 IST 00h:00m:43s 2016-11-18 04:30 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test   /vc65.happycow.local/VirtualMachines windows9Guest 7.2.180-118    2016-11-18 04:29 IST 2016-11-19 04:29 IST 00h:00m:12s  /vc65.happycow.local/VirtualMachines/Job-A Windows VMware Image Job-A            DWMY      Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423572598 Avamar N/A

9147942378903009 Completed                0          2016-11-18 04:33 IST 00h:00m:22s 2016-11-18 04:34 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test-B /vc65.happycow.local/VirtualMachines centosGuest   7.2.180-118    2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:31s  /vc65.happycow.local/VirtualMachines/Job-A Linux VMware Image   Job-A            DWMY      Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

Again, all the confusing? Well, we need to look at two fields here. The Job ID field and the Work Order ID Field. 

The Job ID filed is the one highlighted in Red and the Work Order ID is the one highlighted in Orange.

The Work Order ID will match the first Name-ID in the log directory, but still this will not be helpful if there are too many VMs in the same backup Job as they will all have the same Work Order ID. 

Step 2 of the sneaky trick:

We will use the Job ID to view the logs. The command would be:
# mccli activity get-log --id=<Job-ID> | less

The first thing you will see as the output is:

0,23000,CLI command completed successfully.
Attribute Value

Followed by tons of blank spaces and dashes. 

Step 3 of the sneaky trick

The first Event ID of any backup is performing the logging and this event ID is 5008

So as soon as you run the get-log command, search for this Event ID and you will be directly taken to the start of the client's backup logs. You will see:

2016-11-18T04:33:29.903-05:-30 avvcbimage Info <5008>: Logging to /usr/local/avamarclient/var/Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.log

Not only you can view the logs from here, you also have the complete log file name too.

That's it. Sneaky tricks in place. 

vSphere 6.5: Installing vCenter Server Appliance Via Command Line

Along with the GUI method of deploying vCenter appliance, there is a command line path as well, which I would say is quite fun and easy to follow. There are a set of pre-defined templates available and these are on the vCenter 6.5 appliance ISO file. Download and mount the VCSA 6.5 ISO and browse to

CD Drive:\vcsa-cli-installer\templates\install

You will see the following list of templates:

You can choose the required template from here for deployment. I will be going with the embedded VCSA being deployed on an ESXi host. So my template will be embedded_vCSA_on_ESXi.json

Open the required template using a notepad. The notepad has a list of details that you need to fill out. For my case, it was to provide ESXi host details, appliance details, networking details for the appliance, Single Sign On details. The file looks somewhat similar to the below image, post the edit:


Save the file as .json on to your desktop. Next, you will have to call this file using the vcsa-deploy.exe file. 
In the CD drive, browse to vcsa-cli-installer-win32/lin64/mac depending on the OS you are accessing this from and run the below command from PowerShell

vcsa-deploy.exe install --no-esx-ssl-verify --accept-eula --acknowledge-ceip “Path to the json file”

If there are errors in the edited file, it is going to display what the error is and in which line of the file this error is contained. The first step is a template verification, and if the template verification completes successfully, you should be seeing the below message:


The next step it starts automatically is the appliance deployment, and you will see the below task in progress:


Once the appliance is deployed, the final stage is configuring of the services. At this point you will see the below task in progress:


That's pretty much it, you can go ahead and login to web client (Flash / HTML)

Hope this helps!

Wednesday, 16 November 2016

vSphere 6.5: Login To Web Client Fails With Invalid Credentials

So, today I was making certain changes on my password policies on vSphere 6.5 and I ran into an interesting issue. I had created a user in the SSO domain (vmware.local), called as happycow@vmware.local and I tried to login to web client with this user. However, the login failed with the error: Invalid Credentials.


In the vmware-sts-idmd.logs located at C:\ProgramData\VMware\vCenterServer\logs\sso the following were noticed:

[2016-11-16T12:51:22.541-08:00 vmware.local         6772f8c3-7a11-479e-a224-e03175cc1b1a ERROR] [IdentityManager] Failed to authenticate principal [happycow@vmware.local]. User password expired. 
[2016-11-16T12:51:22.542-08:00 vmware.local         6772f8c3-7a11-479e-a224-e03175cc1b1a INFO ] [IdentityManager] Authentication failed for user [happycow@vmware.local] in tenant [vmware.local] in [115] milliseconds with provider [vmware.local] of type [com.vmware.identity.idm.server.provider.vmwdirectory.VMwareDirectoryProvider] 
[2016-11-16T12:51:22.542-08:00 vmware.local         6772f8c3-7a11-479e-a224-e03175cc1b1a ERROR] [ServerUtils] Exception 'com.vmware.identity.idm.PasswordExpiredException: User account expired: {Name: happycow, Domain: vmware.local}' 
com.vmware.identity.idm.PasswordExpiredException: User account expired: {Name: happycow, Domain: vmware.local}
at com.vmware.identity.idm.server.provider.vmwdirectory.VMwareDirectoryProvider.checkUserAccountFlags(VMwareDirectoryProvider.java:1378) ~[vmware-identity-idm-server.jar:?]
at com.vmware.identity.idm.server.IdentityManager.authenticate(IdentityManager.java:3042) ~[vmware-identity-idm-server.jar:?]
at com.vmware.identity.idm.server.IdentityManager.authenticate(IdentityManager.java:9805) ~[vmware-identity-idm-server.jar:?]
at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_77]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_77]
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323) ~[?:1.8.0_77]
at sun.rmi.transport.Transport$1.run(Transport.java:200) ~[?:1.8.0_77]
at sun.rmi.transport.Transport$1.run(Transport.java:197) ~[?:1.8.0_77]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_77]
at sun.rmi.transport.Transport.serviceCall(Transport.java:196) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) ~[?:1.8.0_77]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) [?:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]

The account and it's password was coming up as expired. I was able to login to Web Client with the default SSO user account without issues.

This issue occurs when the SSO password expiration lifetime has a larger value than the maximum value permitted.

Under Administration > Configuration > Policies, the password expiration was set to 36500 days. KB 2125495 similar to this, talks about this value should be less than 999999.

Changed this value to 3650 days (10 years) and the other users under the SSO were able to login. The same is seen on 6.0 as well, with a different error: Authentication Failure.

Tuesday, 15 November 2016

vSphere 6.5: Installing vCenter Appliance 6.5

With the release of vSphere 6.5, the installation of vCenter appliance just got a whole lot easier. Earlier, we required client integration plugin to be available, and then the deployment was done through a browser. And as we know, client integration plugin had multiple compatibility issues. Well, no more of client integration plugin is used. The deployment is going to be via an ISO which would have an installation wizard that can be executed on Windows, MAC or Linux.

The vCenter Server Appliance consists of a 2-stage deployment.
1. Deploying VCSA 6.5
2. Configuring VCSA 6.5

Deploying VCSA 6.5

Download the vCenter Server appliance from this link here. Once the download is complete, mount the ISO onto any machine and run the installer. You should be seeing the below screen.


We will be choosing the Install option as this is a fresh deployment. The description then shows that there are two steps involved in the installation. The first step will deploy a vCenter Server appliance and the second step will be configuring this deployed appliance.


Accept the EULA


Choose the type of deployment that is required. I will be going with an embedded Platform Services Controller deployment.


Next, choose the ESXi host where you would like to have this vCenter appliance deployed and provide the root credentials of the host for authentication.


Then, provide a name for the vCenter appliance VM that is going to be deployed and set the root password for the appliance.


Based upon your environment size, select the sizing of the vCenter appliance.


Select the datastore where the vCenter appliance files need to reside.


Configure the networking of vCenter appliance. Please have a valid IP which can be resolved both forward / reverse prior to this to prevent any failures during installation.


Review and finish the deployment, and the progress for stage 1 begins.


Upon completion, you can Continue to proceed to configure the appliance. If you close this window out, then you need login to the web management page for VCSA in the https://vcenter-IP:5480 to continue with the configuration. In this scenario, I will choose the Continue option to proceed further.

Configuring VCSA 6.5


The stage 2 wizard begins at this point. The first section is to configure NTP for the appliance and enable Shell access for the same.


Here, we will mention the SSO domain name, the SSO password and the Site name for the appliance.
In the next step, if you would like to enable Client Experience Improvement Program, you can do so, else you can skip and proceed to completion.


Once the configuration wizard is completed the progress for Stage 2 begins.


Once the deployment is complete you can login to the web client (https://vCenter-IP:9443/vsphere-client) or the html 5 client (https://vCenter-IP/ui) The HTML web client is available only with vCenter server appliance.

vSphere 6.5: What is vCenter High Availability

In 6.0 we had the option to provide high availability for the Platform Services Controller by deploying redundant PSC nodes in the same SSO domain and utilizing a manual re point command or a Load balancer to switch to a new PSC if the current one was down. However, for vCenter nodes there was no such option available, and the only way to have HA for vCenter node was to either configure Fault Tolerance or have the vCenter virtual machine in a HA enabled cluster.

Now with the release of vSphere 6.5, there has been a new much awaited feature added to provide redundancy or high availability for your vCenter node too. This is the VCHA or the vCenter High Availability feature.

The design of VCHA is somewhat similar to your regular clustering mechanism. Before we get to the working of this, here are few prerequisites for VCHA to work:

1. Applicable to vCenter Server Appliance only. Embedded VCSA is currently not supported.
2. Three unique ESXi hosts. One for each node (Active, Passive and Witness)
3. Three unique datastores to contain each of these nodes.
4. Same Single Sign On Domain for Active and Passive nodes
5. One public IP to access and use vCenter
6. 3 Private IP in a different subnet to that of public IP. This will be used for internal communication to check node state.

vCenter High Availability (VCHA) Deployment:
There are three nodes available or deployed once your vCenter is configured for high availability. Active node, Passive node and the Witness (Quorum) node. The active node will be the one that would have the Public IP vNIC in up state. This public IP will be used to access and connect to your vSphere Web Client for management purpose.

The second node is the Passive node which is the exact clone of the active node. It has the same memory, CPU and disk configurations as that of the Active node. The public IP vNIC will be down for this node and the vNIC used for Private IP will be up. The private network between Active and Passive is for cluster operations. The active node will have it's database and files updated regularly and this has to be synced across the Passive node, and these information will be synced over the Private network.

The third node, also called as quorum node acts as a witness. This node is introduced to avoid split-brain scenario which arises due to network partition. In a case of network partition we cannot have two active nodes up and running and the quorum node decides which node is active and which has to be passive.

The vPostgres Replication is used to enable database replication between active and passive nodes and this is a synchronous replication. The vCenter files are replicated using native Linux Rsync which is a asynchronous replication.

 What happens during a failover?

When the active node goes down, the passive node becomes the active and assumes the public IP address. The state of the VCHA cluster enters a degraded state since one of the node is down. The recovery time is not transparent and there will be a RTO of ~5 minutes.

Also, your cluster can enter a degraded state when your active node is still running in a healthy state, but either the passive or the witness node are down. In short, if one node in the cluster is down, then the VCHA is in a degraded state. More about VCHA states and deployment will be in a later article.

Hope this was helpful.

Friday, 11 November 2016

VDP / EMC Avamar Console: Login Incorrect Before Entering Password

Today while force simulating failed root login attempts I ran into an issue. So, I had a VDP 6.1.2 appliance installed and opened a console to this. Upon requesting for the "root" credentials, I simply went ahead and entered wrong password multiple times. Then, at one point of time, as soon as I enter "root" for VDP login, it complained Login Incorrect.


Even before entering the password the login was failing. If I SSH into the appliance, I was able to access the VDP. If I use admin to login to VDP either via console or SSH it works and I can sudo to root from there.

From messages.log the following was recorded during incorrect root login from the VM Console:

Nov  6 09:58:40 vdp58 login[26056]: FAILED LOGIN 1 FROM /dev/tty2 FOR root, Authentication failure

If I run which terminal I am on in the SSH of the VDP using the # tty command, I saw that the terminal used was /dev/pts/0
And when I logged into VDP as admin and sudo to root and checked for the terminal name, as expected, it was /dev/tty2

The Fix:

1. Change your directory to:
# cd /etc
2. Edit the securetty file using vi
# vi securetty
The contents of this file was "console" and "tty1"
Go ahead and add the terminal which you are trying to access the root user for the appliance from, in my case tty2, and save the file.

Post this, I was able to login as root from the VDP console.

Also, applies to EMC Avamar Virtual Edition.


Thursday, 3 November 2016

VDP Status Error: Data Domain Storage Status Is Not Available

Today when I logged into my lab to do bit of testing on my vSphere Data Protection 6.1 appliance I noticed the following in the VDP plugin in Web Client:

It was stating, "The Data Domain storage status is not available. Unable to get system information
So, I logged into the vdp-configure page and switched to the Storage tab and noticed the similar error message.

So the first instinct was to go ahead and Edit the Data Domain Settings to try re-adding it back. But that failed too, with the below message.


When a VDP is configured to a Data Domain, these configuration error logging will be in ddrmaintlogs located under /usr/local/avamar/var/ddrmaintlogs

Here, I noticed the following:

Oct 29 01:54:27 vdp58 ddrmaint.bin[30277]: Error: get-system-info::body - DDR_Open failed: 192.168.1.200, DDR result code: 5040, desc: calling system(), returns nonzero
Oct 29 01:54:27 vdp58 ddrmaint.bin[30277]: Error: <4780>Datadomain get system info failed.
Oct 29 01:54:27 vdp58 ddrmaint.bin[30277]: Info: ============================= get-system-info finished in 62 seconds
Oct 29 01:54:27 vdp58 ddrmaint.bin[30277]: Info: ============================= get-system-info cmd finished =============================
Oct 29 01:55:06 vdp58 ddrmaint.bin[30570]: Warning: Calling DDR_OPEN returned result code:5040 message:calling system(), returns nonzero

Well, this says a part of the problem. It is unable to fetch the Data Domain System Information. So the next thing is to see vdr-configure logs located under /usr/local/avamar/var/vdr/server_logs
All operations done in the vdp-configure page will be logged under vdr-configure logs.

Here, the following was seen:

2016-10-29 01:54:28,564 INFO  [http-nio-8543-exec-9]-services.DdrService: Error Code='E30973' Message='Description: The file system is disabled on the Data Domain system. The file system must be enabled in order to perform backups and restores. Data: null Remedy: Enable the file system by running the 'filesys enable' command on the Data Domain system. Domain:  Publish Time: 0 Severity: PROCESS Source Software: MCS:DD Summary: The file system is disabled. Type: ERROR'

This is a much detailed Error. So, I logged into my Data Domain system using the "sysadmin" credentials and ran the below command to check the status of the filesystem:
# filesys status

The output was:
The filesystem is enabled and running.

The Data Domain was reporting the file-system is already up and running. Perhaps this was in a non responding / stale state. So, I re-enabled the file-system using:
# filesys enable

Post this, the data domain automatically connected to the VDP appliance and the right status was displayed.

Tuesday, 1 November 2016

Avigui.html Shows Err_Connection_Refused in Avamar Virtual Edition 7.1

Recently I started deploying and testing the EMC Avamar Virtual Edition, and one of the first issue I ran into was with the configuration. The deployment of the appliance is pretty simple. The Avamar virtual edition 7.1 is a 7zip file, which when extracted provides the ovf file. Using the deploy ovf template option I was able to get this appliance deployed. Post this, as per the installation guide of AVE (Avamar Virtual Edition), I added the data drives, configured the networking for this appliance and rebooted post a successful configuration. 

However, when trying to access the https://avamar-IP:8543/avi/avigui.html, I received the Err_Connection_Closed message. No matter what I tried I was unable to get into the actual configuration GUI to initialize the services. 

Looks like there are couple of steps I had to run. There is a package called AviInstaller.pl which is responsible for package installations. So, this had to be installed. To do this, SSH into the avamar appliance as root and password as changeme and browse the below directory:
# cd /usr/local/avamar/src/
Run the aviInstaller bootstrap with the below command:
# ./avinstaller-bootstrap-version.sles11_64.x86_64.run

Once this runs, log back into the same avigui.html URL and we should be able to see the below login screen.
That's pretty much it.