Sunday, 29 November 2015

Automatic Shell and SSH Session Logout

Written by Suhas Savkoor



We have all used SSH Sessions to the ESXi hosts. We use either Putty or other means to obtain access to ESXi Shell, so that we can perform certain operations through the command line for the host. 

One additional thing is required for this process. We need to Start the SSH Service for the host. This is found Under Configuration > Security Profile > Services. This is disabled by default for security purposes. 

Everything is great. We can even make this service better by configuring time-outs. 

There are two time-outs that we can configure.

1. ESXi Shell Interactive Time Out - This is applicable to the SSH Sessions that were opened after the configuration was done. Let's say we have configured this time-out to 60 seconds. So once this configuration is done, and a new Putty Session is opened, it automatically closes after 60 seconds of no activity. Well, if you don't run any commands or you don't scroll in the SSH Session for 60 seconds, you will be logged out automatically. 

2. ESXi Shell Time Out - Remember that SSH Service that we were talking about, yes? We can configure a time-out for this as well. Setting this to, for example 60 seconds, will cause the SSH Service to stop automatically after 60 second regardless of any activity being done or not in the Putty terminal. Shell time out stops access to new Putty Sessions, however, if you have already open Putty Sessions, they will continue to work just fine. 

How do we configure this? 

Method 1: GUI

Select the Host > Configuration > Advanced (Under Software)
Here scroll down to UserVars and locate ESXiShellInteractiveTimeOut and ESXiShellTimeOut and set them to a required value (in seconds)

Method 2: Command Line

Open a SSH to the host that requires to be configured and run these two commands:

ESXi Shell Interactive Time Out


ESXi Shell Time Out


Restart the services for the changes to be applied.



esxcli with time-outs folks!




Adding Active Directory User To VCSA

Written by Suhas Savkoor



So, I have got my new vCenter Server Appliance 5.5 set up, and now I want to have a domain user to be able to login to vCenter and manage the environment. 

Here my domain is "vcloud.local" and I have created a User Suhas2 under this domain. Now I want Suhas2 to be able to Login to vCenter Appliance with full Administrator permissions and be able to perform all vCenter tasks. 



1. So First I will make sure that my vCenter Appliance is running and I will make a note of the IP address assigned to this.


2. Let's login to the vCenter Appliance Management Page (Commonly called VAMI Page). To access this page:
                      https://<Appliance_IP_or_FQDN>:5480

Once the web page loads, login to the appliance with "root" credentials. 

3. Let's add the vCenter Appliance to a domain. Here I will be joining my appliance to the vcloud.local domain.

Navigate to the vCenter Server Tab and Select Authentication. Check Active Directory Enable and enter the Domain name and it's credentials. Click Save Settings.

We need to restart the appliance for the changes to be applied. Navigate to System tab and select Reboot

4. Once the appliance has finished rebooting, login to the web client for the VCSA. 
                      https://<Appliance_IP_or_FQDN>:9443

Once the web page loads up, login to the client with the SSO credentials. 

5. Select Administration and Under Single Sign On, click Configuration. Here we need to add the Identity Source, so that the Users under the domain can be added to vCenter and appropriate permissions can be assigned to them. 

In the Identity Source tab Click The Add Button. There are multiple Identity Source type, and information regarding each of them can be found here. In my case, I am going to choose Active Directory as a LDAP Server. 

Fill Up the Identity Source Settings. 




Once done, click Test Connection and verify the connection was established successfully. We should be now able to see this Identity Source listed under the table. I will make this Identity Source as a default Domain so that there is no need to specify the domain name for the user every time I login to vCenter. 

Select the Identity Source and Select Set as Default Domain option (Under tool-bar of identity Sources tab.)

6. Now it is time to add the Active Directory user to the vCenter and assign Administrator permissions to it. 

Let's login to the vSphere Client with SSO Credentials. 

Select the vCenter and click Permissions tab. 




7. Right Click And Click Add Permissions and you will see an Assign Permissions window. 


8. Click Add  and from the Domain drop-down, select your respective domain that was recently added. Under Users and Groups locate your required user and select Add and click OK. 


9. From the right hand side Under Assigned Role drop-down, select your required role. Here I am choosing the Administrator Role for the user. Click OK.

10. That's it, we are done. Now verify the procedure by opening a client to vCenter and logging in with the AD user that we just assigned the permissions to. You can verify the user logged into vCenter from the Bottom Right Column which displays this information. 



And there we go, success!

Saturday, 28 November 2015

Virtual Machine Disk Consolidation Is Needed

Written by Suhas Savkoor



We all have performed backup of virtual machines, either using VMware backup solutions or from third party vendors like Veeam or Netapp VSC. Most of the times, the backup jobs go well. However, in some cases we see the annoying message saying "Virtual machine disk consolidation is needed" Here we do not see any snapshots in the snapshot manager, but when we right click the VM and select Edit Settings and choose the hard disk, we notice that this hard disk is actually running on a snapshot, a vm-name-00000x.vmdk



Now, most of the times, we right click the virtual machine that is displaying this message, select Snapshot and Click Consolidate and it works. In the task and events section we can see the consolidation process progressing to success.

Then we have those other sticky situation where it does not work. We receive a ton of errors when we click Consolidate. Specially the "Cannot consolidate file since it is locked" and "Unable to access file <unspecified filename>"

There are two steps to troubleshoot this:

Step 1:

Make sure that there is no active backup job running for this VM. If there is an active snapshot job for the VM or if the backup job is configured for this VM, then that backup application will be holding a lock on the virtual machine's vmdk file, resulting in failure of consolidation.

So first try, Power OFF the backup appliance and carry out the consolidation process.

If this works, then great!
If not, then we have couple more in depth troubleshooting to do, which takes us to step 2.

Step 2:

In this step we need to verify the integrity of the snapshot chain. Now, the question is, what is Snapshot chain integrity.

Let's break it down to bits:

I have a CentOS7 VM here, which has one VMDK and this is running on the base disk, and not a snapshot disk.



Next I take a snapshot of the VM and this time you can notice it is running on a snapshot disk.



So with this in mind, let's look into the snapshot chain. To check the snapshot chain we need two things:


  • SSH (Putty) Access to the host where this VM is residing
  • A very interesting command to generate the chain structure


Login into Putty, and change your directory to the virtual machine's directory and then run this command, and we will receive an output like this:


Re-arranging this in text output, such that the base vmdk information is displayed first and the snapshot vmdk information is specified next, we get:

CentOS7.vmdk
CID=70b0b210
parentCID=ffffffff
RW 33554432 VMFS "CentOS7-flat.vmdk"


CentOS7-000001.vmdk
CID=71a5b396
parentCID=70b0b210
parentFileNameHint="CentOS7.vmdk"
RW 33554432 VMFSSPARSE "CentOS7-000001-delta.vmdk"


Chain Structure Analysis:

For CentOS7.vmdk (Base disk), we have a Parent ID which is 8 f's, and this always remains the same not matter which VM we use.

For the same Disk we have a Child ID which is a Random 8 digit hexadecimal.

Now the Child ID (CID) of the base disk must be the parent ID (PID) of the first snapshot disk.
In simple formula
CID(CentOS7.vmdk) = ParentCID(CentOS7-000001.vmdk)
Which in our case is true.

For CentOS7-000001.vmdk, the CID is again a random 8 digit hexadecimal. And this CID will be equal to the ParentID of the next snapshot (CentOS7-000002.vmdk), and this structure continues.

Also, the CentOS7-000001.vmdk points to it's corresponding -000001-delta.vmdk with a parent file of (CentOS7.vmdk)

The next chain would be CentOS7-000002.vmdk pointing to -000002-delta.vmdk with a parent file of  (CentOS7-000001.vmdk)

This structure should always be in this format. if there is a mist-match in the snapshot chain format then the consolidation fails.

If the chain structure is too big and if there are lot of corruptions, then the feasible workaround would be to clone the VM, as it would be very tedious to sit through multiple chain structures and do the necessary corrections.


What do we learn from this?

1. If you receive this message saying virtual machine disk consolidation is needed, DO NOT directly go and remove the snapshot files.
2. Verify if the VM is running on the snapshot file, yes in most of the cases.
3. If it is running perform a consolidation.
4. if it works, good! If not, check the chain structure, correct the necessary and run the consolidation again.


Make good use of the command folks!



Additional Tip!

Use Notepad++ To verify chain integrity as it highlights the same characters making it easier to verify CIDs and PIDs.

Friday, 27 November 2015

Reclaim Space For Thin Provisioned VMDKs

Written by Suhas Savkoor



Most of the time, we come across the issue where the disk space reported from the Guest OS perspective does not match the Virtual machine's Used storage. This is seen in the case where we have a VM with thin provisioned disk.
When we delete data from the Guest OS level, the drive space within the guest reflects the appropriate storage space. However, since VMware does not do automatic space reclamation, this would not have been updated in vSphere. Which is why we need to manually perform the space reclamation task.

So how does this process work? 

Data is written in blocks on your hard disk. Now when data is deleted from the drives it is actually not removed from the blocks, it's just that it is removed from the file allocation table. Data is truly removed when the blocks are zeroed out, in this case it is not. 

Then the first step would be to carry out the zeroing out the blocks from the Guest OS. There are two paths here.

Path 1: Windows

We have to use Sdelete. The guest OS will be seeing the updated space, but since the blocks aren't zeroed out, VMware will not recognize this free space. So Sdelete will track down these unused blocks and write zeroes on them. Download Sdelete from here

Open Command prompt with elevated permissions and run the command:



The drive letter must be replaced with the drive that has to be written with zero. Run this multiple disks if required.

Path 2: Linux

Shutdown down all the services that make any read-write changes to the disk while carrying out the zeroing operation. If you have multiple disks, this has to be run on each one of them.



The mounted volume must be replace with the drive that needs to be zeroed out.


Once the operation is done from the Guest OS, it is then we have to run the space reclamation command from the ESXi end.

Before proceeding with any step you will have to Power OFF the virtual machine. If you perform the space reclamation when the VM is powered ON, then you will receive an error called disk is locked or failed to release lock.

Once the VM is powered OFF, take a SSH (Putty) Session to the host where this virtual machine resides.
Then run the following command:



The VM.vmdk has to be the path to where this vitual machine's vmdk resides. This has to be executed for the vmdk (descriptor file) and not the flat file (data file)

Once this is done, the updated space is shown in the vSphere Client. 

Thursday, 26 November 2015

Deploying vCenter Orchestrator Appliance

Written by Suhas Savkoor



Why vCenter Orchestrator? Because I want to try automation.

Let's start with the simplest thing first: Deploying a vCO appliance in vCenter. Here I am using a 5.5 Orchestrator.
You can download the 5.5 Orchestrator Appliance .ova file here

1. First, let's start by importing the ova file of Orchestrator



2. Browse the location and import the vCO .ova file


3. Accept the EULA


4. Give a name to the appliance and select the Datacenter for this appliance.


5. Provide a datastore for the appliance to reside


6. Provide Networking Details for the appliance


Review the final configuration and complete the deployment. 

Tuesday, 24 November 2015

503 Service Unavailable during vMotion

Written by Suhas Savkoor



When you vMotion any VM in your vCenter environment, the vMotion fails immediately even before hitting the progress task in the task and event menu.

With 2008R2/Vista/Win7, this was a known issue and was fixed with Microsoft hotfix.
This can be found in this KB Article.

However, we might run into this 503 Service Unavailable issue on a 2012 machine as well.
When we run netstat -ano, we see a lot of Ports in Time_Wait state. The vCenter is not finding any ports to perform it's operation, hence causing the vMotion to fail.

There is a two step fix for this.

Step 1: Restart all the vCenter Services in an orderly fashion. The order is

        1. VMware Directory Service
2. VMware Kdc Service
3. VMware Certificate Service
4. VMware Identity Management Service
5. VMware Secure Token Service
6. VMware vCenter Inventory Service
7. VMware vCenter Server

This will fix the issue, and you will be able to perform a vMotion. By restarting the services, we released all the ports, so once all the ports are released it will be available to establish a connection successfully.

However, a more permanent fix would be to Increase the JVM Heap Size for Management WebServices, SSO and Inventory Service.

This can be found in this KB Article.

This should be a common issue with 5.5 on 2012 Windows. But, sometimes, you never know. 

Monday, 23 November 2015

Active Coredump partition while Using Coredump to file

Written by Suhas Savkoor



If you have configured coredump for an ESXi host, you come across two methods. The first is configuring coredump to a partition, and the other is configuring coredump to a file.

Let's take a scenario, where we had initially configured coredump to a partition, and for some reason we decide to migrate to coredump to file. When we do this, we see the coredump to partition is still Active and Configured.

When you Disable and Unconfigure coredump to partition, it get's auto activated back when the ESXi host is rebooted.
Then, we have coredump to partition active and configured as well coredump to file configured and active.

I performed the below test and a work around for this situation:

1. I had my coredump configured to partition initially.
   


2. I unconfigured the coredump to partition
   


3. I configured coredump to file
   


4. I activated the coredump to file
   


5. Next, we can see the coredump to partition is not active and not configured whereas coredump to file is active and configured


6. I rebooted the ESXi and host, and then when I view the configuration I see both the coredump to partition and file is active and configured


7. What I am going to do next is add the force disable coredump to partition on next reboot of ESXi host. In this case, when the coredump is configured to file, the coredump to partition is never activated when the host is rebooted. This is done by adding the coredump to partition unconfigure parameter in the local.sh file

   

The configuration file looks similar to this:

    # local configuration options

    # Note: modify at your own risk!  If you do/use anything in this
    # script that is not part of a stable API (relying on files to be in
    # specific places, specific tools, specific output, etc) there is a
    # possibility you will end up with a broken system after patching or
    # upgrading.  Changes are not supported unless under direction of
    # VMware support.

    exit 0

Add the following parameters in the file, so that the file is updated as:

    # local configuration options

    # Note: modify at your own risk!  If you do/use anything in this
    # script that is not part of a stable API (relying on files to be in
    # specific places, specific tools, specific output, etc) there is a
    # possibility you will end up with a broken system after patching or
    # upgrading.  Changes are not supported unless under direction of
    # VMware support.

    /bin/esxcli system coredump partition set -e 0
    /bin/esxcli system coredump partition set -u
    /bin/esxcli system coredump partition set -e 0

    exit 0

Save this configuration file.

8. Reboot the ESXi host. And this time when you check for coredump to partition it is no longer configured and active. The coredump to file is only configured and active as required.

About

I am Suhas also fondly mistaken as Susha.

On August 2014, I started out as a L1 engineer supporting core VMware Product, ESXi and vCenter. At the end of 2015, I started the epic journey of supporting Backups and Recovery solution, vSphere Data Protection. End of Q3 2016, I started supporting Site Recovery Manager.

Currently, at end of 2016, I am in VMware Storage and Availability BU, working on SRM and VDP issues. Currently I act as a lead supporting L1 and escalated L2 issues on vSphere Data Protection.

Apart from this, I work on vRealize Orchestrator and a little bit on vRealize Log Insight, with a great amount of interest towards shell scripting. I tend to draw more interest towards EMC Avamar and Data Domain, grabbing all the necessary opportunities to learn more about it.

Say Hi! if I have worked on your open tickets with VMware.

Off work, I play a lot of Grand Theft Auto 5 and an enthusiast in Spoken Word Poetry. Gorillaz for music and Matthew Reilly for fiction.




[Certified]
VMware Certified Associate 5 - Data Center Virtualization
VMware Certified Professional 5 - Data Center Virtualization
VMware Certified Associate 6 - Data Center Virtualization
VMware Certified Professional 6 - Data Center Virtualization
VMware Certified Advanced Professional 5 - Data Center Administration
VMware Certified Advanced Professional 5 - Data Center Design




[Honours and Awards]
VMware vExpert 2016
VMware vExpert 2017

[Accreditations]
Certified - MicrosoftX R Programming
Certified - MITx Python Programming

[Qualification]
Bachelor of Engineering - Electronics and Communications

[Interests]
VMware, VHDL/Verilog Programming

[Disclaimer]
What is shared on this blog is a product of my perspective, troubleshooting and configuration. As always, have a backup before performing any workarounds.


Sunday, 22 November 2015

Using vCenter Converter Standalone for Virtual to Virtual Conversion

Written by Suhas Savkoor



Let's say we want to resize the VMDK that is given to a virtual machine. By going into Edit Settings of the VM, we cannot decrease the disk size. Changing the value with Edit Settings of the VM holds good if you are trying to increase the Guest's Disk space. If we want to shrink the VMDK, then it is best we use vCenter Converter Standalone.

VMware vCenter Converter Standalone is used if you want to perform a physical to virtual conversion or if you want to convert a Virtual machine to a Virtual machine.
Another important feature of this is, if you want to migrate a VM From Oracle VM Manager to vCenter, Converter Standalone can get this job done.

In this example here, I am using a 5.5 Update 3 vCenter with a 5.5 Update 3 ESXi host. The converter version used here is 6.0 and the guest OS to be converted is Windows 2008 R2. You need to check the release notes of the converter to make sure that the converter supports the Guest OS of the machine that is about to be converted.



Release Notes for Converter 6.0:
https://www.vmware.com/support/converter/doc/conv_sa_60_rel_notes.html

1. Open the vCenter Converter Standalone and select Convert machine.


2. Under the Source System:
The Source type is a Powered ON machine
In my case I am converting a remote machine residing on vCenter, hence I am selecting a remote machine.
Enter the IP address of the machine you want to Convert
Username and the Password of this machine
The Guest OS of the machine you want to convert. here it is a Windows machine


3. The next step is a destination machine.
Destination type: vCenter machine
Server: IP address of the vCenter machine
Username and Password of the vCenter machine


4. Next you need to provide a name for the converted machine. Select a datacenter in your vCenter where this converted machine should reside.


5. Next provide the ESXi host where to converted machine should reside. The datastore where it should be place and the hardware version of the machine.


6. Next step you have multiple options. under data to copy you can resize the required Guest OS drive. You can resize one single disk, or all disks and even exclude certain disks from being available on the converted machine.


7. Under advanced option you can choose whether to Power ON the converted machine after conversion or Not to along with other options.


8. The last step is to review the configurations and begin conversion. You can see after you begin conversion the converted machine is visible in your vCenter.


The conversion is going to take some time depending upon the number and size of the hard disk that is given to a virtual machine/physical machine.

Network Up-link redundancy lost/restored alarms not working as expected

Written by Suhas Savkoor



When messing around in my lab, my friend and I came across this rather weird issue involving the Network Redundancy Lost Alarm.

What this alarm is all about:
When defined on the vCenter Level, when any of the vSwitch for any of the ESXi host loses their network uplink redundancy (That is if the vSwitch has 2 NICs and one of the NIC goes down) this alarm is triggered. And when the uplink or the NIC is given back to this vSwitch the redundancy is restored and the alert is cleared automatically. 

Now, with vCenter SMTP Settings we have the feature of sending the alerts generated in vCenter to the required email address. We configured this alert to be able to send email notifications. 

Here comes the interesting part:
When the redundancy is lost for the vSwitch, in my lab it was for vSwitch 1, the alert is generated and seconds later we received the email notification stating the same. All went well. Next, we added the NIC back to the vSwitch 1, and correspondingly the NIC uplink redundancy lost alarm was cleared. However, this time we did not receive any email. 

We spent good 30 minutes troubleshooting this issue. It started of with verifying: 

1. SMTP Settings: Administration >  vCenter Server Settings > SMTP. Looks good
2. Under the Alarm definition for the vCenter > Edit Settings for Network uplink redundancy lost alarm.
3. Here the parameters under the Triggers tab were:

1. Lost Network Redundancy - Alert
2. Restored uplink redundancy to portgroups - Normal
3. Lost Network Redundancy on DVPorts - Alert
4. Restored Network Redundancy to DVPorts - Normal

Looked good!

4. The settings Under Actions Tab were: Send Notification Email; The Email address and all the alerts set to once. Looked good to!

Out of nowhere, we decided to create a similar new alarm and see how that works. 
Under vCenter, we defined a new alarm and named it "NIC Redundancy Lost" and replicated all the settings that was there under the pre-defined "Network Uplink Redundancy Lost" alarm to this newly created one. 

Simulated the same issue, first by removing the NIC. An email notification was sent as soon as the alert was triggered in vCenter
Re-added the NIC back and the alert was cleared and seconds later, an email notification was sent stating Uplink redundancy restored. 

Bottom line, there is something fishy that is going on with the pre-defined alarm for this one. If you run into this situation, might as well create a custom alarm and replicate the required parameters. 


Sunday, 8 November 2015

Reset Forgotten VMware ESXi Password.

Written by Suhas Savkoor




We don't always remember everything, correct? We have a password for our Facebook, a password for our email IDs, and passwords for our ESXi hosts.
Naturally, we tend to forget the ESXi password.

Now, if we ask VMware about how to reset a forgotten password for ESXi, the recommended solution would be to re-install. As quoted by our Knowledge Base article any other means might result in a host failure due to complex nature of the ESXi architecture.
kb.vmware.com/kb/1317898

Of course this is not recommended for our production environments where we have critical VM's running, and the must solution is to re-install. However, why not play around in our lab? Why perform a re-install for your test ESXi hosts.

There are multiple ways to reset the forgotten password, but my favorite choice would be using the Host Profiles.
Why? It's simple! And I like GUI.

Let's take a look at the steps to do this:

1. Login to your vCenter and Right Click the host which requires a password reset.
2. Select Host Profile and choose the Create Profile From Host option. This ESXi becomes the reference host for your profile. This can be applied later for multiple hosts (If required). Because you have forgotten the password for all!
3. In the Window that pops up, let's give it a name. Say ESXi_Password_Reset and a description if required.
4. Click Next > Review and Finish.

A host profile is now ready. Now, time to configure this.

1. In your vCenter Select Home > Host Profiles
2. Right click the newly created profile and select Edit profile.
3. Now, there's a bunch of options that can be configured here. But, in our case, we will select the Security Configuration (Passwords duh!) and expand it. Well, choose the Administrator password option.
4. Under the option, 'What should the administrator password be', select Configure a fixed administrator password.
5. Enter the new password and confirm it and click OK.
6. Right click the Profile again and select the option Enable or Disable profile Configuration.
7. Here out of the big list of settings, let's choose our required option, which is? You are right! Security Configuration and Click OK.

Well, we got the configuration done. Time to apply:

1. Put the host into maintenance mode. This means power off all the VMs on it, or migrate them off to a different host.
2. Go back to the hosts and clusters view and right click the ESXi host > Host Profiles > Manage profile
3. If you have multiple host profiles, carefully select the ESXi_Password_Reset profile and click OK
4. Right Click the host, again! Host Profile > Apply profile.
5. Review the changes and there we go, Click OK and we got ourselves a new password.

Time to verify:

1. Take a SSH to the host.
2. Login with root and of course, our new password.

esxcli away.

Excluding certain VMDKs from Snapshot in VMware

Written by Suhas Savkoor



How to Perform Selective Snapshot of a VM.

Sometimes, we will have a VM with multiple VMDKs, and there comes a need when we have to take a snapshot of this VM. However, we only want to snapshot certain VMDKs.

How do we do this?

1. Power off the virtual machine
2. Delete any existing snapshots before you change the disk mode (Snapshot Manager > Delete All)
3. Right-click the virtual machine and select Edit Settings
4. Select the hard disk that you want to exclude and change the mode to Independent Persistent

Here Independent mode is of two types:

Persistent:
When a VMDK is configured in Independent Persistent Mode, what you will see is that no delta file is associated with this disk during a snapshot operation. In other words, during a snapshot operation, this VMDK continues to behave as if there is no snapshot being taken of the virtual machine and all writes go directly to disk. So there is no delta file created when a snapshot of the VM is taken, but all changes to the disk are preserved when the snapshot is deleted.

Non Persistent:
When a VMDK is configured as Independent Non-persistent Mode, a redo log is created to capture all subsequent writes to that disk. However, if the snapshot is deleted, or the virtual machine is powered off, the changes captured in that redo log are discarded for that Independent Non-persistent VMDK.

Here we want the data to be retained after a Power OFF for the snapshot VMDKs. Hence, we choose Persistent mode.


Enjoy Snapshot'ing