Sunday, 22 November 2015

Network Up-link redundancy lost/restored alarms not working as expected

Written by Suhas Savkoor



When messing around in my lab, my friend and I came across this rather weird issue involving the Network Redundancy Lost Alarm.

What this alarm is all about:
When defined on the vCenter Level, when any of the vSwitch for any of the ESXi host loses their network uplink redundancy (That is if the vSwitch has 2 NICs and one of the NIC goes down) this alarm is triggered. And when the uplink or the NIC is given back to this vSwitch the redundancy is restored and the alert is cleared automatically. 

Now, with vCenter SMTP Settings we have the feature of sending the alerts generated in vCenter to the required email address. We configured this alert to be able to send email notifications. 

Here comes the interesting part:
When the redundancy is lost for the vSwitch, in my lab it was for vSwitch 1, the alert is generated and seconds later we received the email notification stating the same. All went well. Next, we added the NIC back to the vSwitch 1, and correspondingly the NIC uplink redundancy lost alarm was cleared. However, this time we did not receive any email. 

We spent good 30 minutes troubleshooting this issue. It started of with verifying: 

1. SMTP Settings: Administration >  vCenter Server Settings > SMTP. Looks good
2. Under the Alarm definition for the vCenter > Edit Settings for Network uplink redundancy lost alarm.
3. Here the parameters under the Triggers tab were:

1. Lost Network Redundancy - Alert
2. Restored uplink redundancy to portgroups - Normal
3. Lost Network Redundancy on DVPorts - Alert
4. Restored Network Redundancy to DVPorts - Normal

Looked good!

4. The settings Under Actions Tab were: Send Notification Email; The Email address and all the alerts set to once. Looked good to!

Out of nowhere, we decided to create a similar new alarm and see how that works. 
Under vCenter, we defined a new alarm and named it "NIC Redundancy Lost" and replicated all the settings that was there under the pre-defined "Network Uplink Redundancy Lost" alarm to this newly created one. 

Simulated the same issue, first by removing the NIC. An email notification was sent as soon as the alert was triggered in vCenter
Re-added the NIC back and the alert was cleared and seconds later, an email notification was sent stating Uplink redundancy restored. 

Bottom line, there is something fishy that is going on with the pre-defined alarm for this one. If you run into this situation, might as well create a custom alarm and replicate the required parameters.