I spoke with a friend this past weekend. He is an intelligent guy. So we had a conversation about an issue he was having with servers. It was an HP server. Apparently, the server motherboard kept failing. I asked him if he checked the power supply. He commented that his HP guy came in and he should know what he is doing. I said ok but I persisted. He explained that it had multiple power supplies a regular and a backup so that could not be the problem. I listened and I let it go. But later I sat down and thought about it.
My troubleshooting strategy for a situation like that would be to perform the following tests… most of these tests require very little in the way of investment except the power line monitor or dranetz test.
do a CPU Stress Test
check Power Supply (PSU)
check power supply for voltage issues on the pins
check power supply ripples with basic oscilloscope…
step by step test of individual parts of power supply (PSU)
Testing High Voltage Section and Power Regulator Section of Power Supply
Check Power Cable
Check power cable for continuity
Check the (UPS) Uninteruptible Power Supply
Ground Fault Theory
Ground Fault Theory and Testing For a server or datacenter environment
Testing UPS Battery
Check Power Outlet
Check power outlet for power issues
Check the actual power over time with a powerline monitor preferrably a dranetz. A dranetz will measure and store a graph of the power activity over time. This way you can tell if there are any spikes or moments of unusual activity and then go from there to correlating it to unusual machine activity. You could also use the dranetz on a ups as well.