Think this one is resolved. *fingers crossed*
For some time, several months, an SBS 2011 Server kept hanging and freezing. The freezing started to happen every few days, at random times. The Server never crashed or "blue screened", but it would get so slow, it was unbearable to even log in. RWA / OWA stopped working. Connecting via Terminal Services would take forever. The immediate solution was a hard reboot, upon which, the performance and functionality were restored.
Now we monitor the fuck out of it. Kept tabs on the LabTech stats as well as Event Logs. Each time the Server would freeze or become unresponsive, LabTech would timeout, thus trigger an alert. What sucked, was there was nothing related in the logs. Now starting to investigate the configuration of the Server in general. This is an SBS 11 Server running in Hyper-V, on a dynamic VHD, on Server 2008 R2 Std., single RAID 5 array, and two other virtual machines, neither having any issue. The SBS 11 VM RAM was increased, CPU's were added, converted the VHD to fixed, and moved to a new RAID 10 array. While this helped, the issue remained.
Stumped, so let's get real smart. Using Performance Monitor, a custom counter was created to help track the performance. It was setup to capture pretty much the same as LabTech, but now the data can be saved. (CPU, Memory, Threads, Disk I/O, etc.) Once the monitor was enabled, a memory dump file was needed. Wouldn't you know it, a few minutes on the Internet, BING even, a command was found to force a Blue Screen and generate the dump file. Applied this clever trick to "crash" the Server when it became unresponsive.
In order to force a BSOD, you need to Regedit:
- HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServiceskbdhidParameters
- New DWORD key called CrashOnCtrlScroll with value of 1.
- Reboot.
Now we can "crash" the Server and generate a dump file. Hold CTRL and press Scroll Lock twice. Doing this is Hyper-V via a Remote Desktop session worked.
Gathered two dump files and the performance logs, and sent away to Microsoft for review. (Being a partner, it's easier and more accurate to have them review.) A day later, they told me to remove old printer drivers and disable a printer related Service. .......................................................seriously? Well, I found two unnecessary HP Printer Services and two (long removed) HP LaserJet 5 printers. Disabled the Services and uninstalled the printers and drivers. Since then, the Server has not had one issue.
While I know the hardware changes were ultimately the resolution here and dump files are useless, it's important to know why they exist in the first place....just in case that rare issue pops up. 😉