Image may be NSFW.
Clik here to view.This latest tip comes courtesy of the Exchange Team at Microsoft. I’d like to think that somewhere deep in the Microsoft internal archives they have a file called “The Case of the Sleeping NIC.” Troubleshooting DAG Failovers and other random drops on Exchange or other applications running on Windows may come down to finding, and waking up, a sleepy NIC.
Problem:
Database Availability Groups seemingly fail over randomly from one member server to another. This can also effect other servers that seemingly go offline at random intervals or for no apparent reason. Console access remains up and functional.
Cause:
Sleeping NICs. No, seriously, NICs on servers going “to sleep” as a result of power-saving settings can cause DAGs to failover. They can also cause other application failovers for clusters, or just plain fails for standalone systems. There’s a power saving option on many NICs that in the GUI is found on the Power Management Tab and is called “Allow the computer to turn off this device to save power.” It makes perfect sense for this option to be enabled on laptops, and even on desktops, but on servers? Whether it makes sense or not, it’s an option that appears to be enabled frequently on servers and is causing random seeming drops in connectivity, DAG and cluster failovers, and other interruptions to connectivity. When a server is completely idle, like in the middle of the night after backups are done, there’s no updates to deploy, and users are all asleep, the operating system can shut down the NIC to save power. This in turn leads to the fail overs and outages that seem to have no clear reason for their cause.
Resolution:
There’s a few different ways you can fix this issue if it is happening to you. Frankly, you may want to proactively fix it now, before it does happen to you. There’s really no good reason for a server’s NIC to go to sleep, any more than for the server’s operating system to go to sleep. Here’s three ways to fix this.
PowerShell to the rescue
You can download a PowerShell script from TechNet called DisableNetworkAdapterPnPCapabilities that will take care of this for you. Consider combining it with a Get-Content file of all your servers and a For-Each to apply this to all of your servers at once. The script is available at http://gallery.technet.microsoft.com/scriptcenter/Disable-turn-off-this-f74e9e4a.
Manual intervention
You can run a “Microsoft Fix it” from http://support.microsoft.com/kb/2740020 to fix individual systems, or you can set the registry key yourself by following these steps, also from the KB above:
- Click Start, click Run, type regedit in the Open box, and then click OK.
- Locate and then click the following registry subkey:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E972-E325-11CE-BFC1-08002bE10318}\DeviceNumber
NoteDeviceNumber is the network adapter number. If a single network adapter is installed on the computer, the DeviceNumber is 0001.
- Click PnPCapabilities.
- On the Edit menu, click Modify.
- In the Value data box, type 24, and then click OK. Note By default, a value of 0 indicates that power management of the network adapter is enabled. A value of 24 will prevent Windows from turning off the network adapter or let the network adapter wake the computer from standby.
On the File menu, click Exit.
See http://support.microsoft.com/kb/2740020 for more about the options available to you when manipulating this key.
Group Policy settings
You can use a GPO to configure power settings for your systems. Create a PowerManagement GPO and link it to each OU that contains servers in your environment. While there are a ton of power management settings in Computer Configuration | Policies | Administrative Templates | System | Power Management, none of them apply to network interfaces. You will have to use your GPO to push a registry key, such as the one detailed above. Configure your power management settings on a model server, then see http://technet.microsoft.com/en-us/library/cc753092.aspx for the steps to take that registry entry and create a GPO that will push the same out to all the other servers you want to configure.
Since you cannot provide your servers with No-Doze or a daily dose of Red Bull, the best thing you can do if you think sleepy NICs are causing you problems is to make sure they stay awake. If you are seeing random drops and failovers, check your NIC Power Management settings. It’s probably going to sleep. Fix that, and you will probably resolve the bigger issue.
The post Troubleshooting DAG Failovers and Other Random Drops appeared first on Email management, storage and security for business email admins.