I usually have between eight and a dozen computers in my home office, all of them running 24/7. People sometimes question me about that and tell me they think I am wasting power by doing so. I get another related question: Is it better to run a computer 24/7 or turn it off when not needed?
In my case, the power is not wasted because my computers are always working on various projects for the IBM World Community Grid, which puts the otherwise wasted CPU cycles of home and office computers around the world to use. They use these computers as nodes in a volunteer distributed supercomputer based on the Berkeley Open Infrastructure for Network Computing (BOINC). The projects include medical, genome, meteorological, and other types of calculations. I also perform backups and install updates at night, and the computers need to be on to do this.
But these functions are almost irrelevant to the question at hand.
The big question is: Doesn't that reduce the computer's life by wearing it out? The short answer is not necessarily; it actually may extend the life of the machine and save energy in the long run. ...Wait, what? How can that be?
The light bulb
Have you ever noticed that light bulbs—especially the ancient incandescent ones—seem to burn out most frequently at the instant when they are turned on? Or that electronic components like home theater systems or TVs worked fine yesterday but don't today when you turn them on? I have, too.
Have you ever wondered why that happens?
Many factors affect the longevity of electronic equipment. One of the most ubiquitous sources of failure is heat. In fact, the heat generated by devices as they perform their assigned tasks is the very heat that shortens their electronic lives.
When I worked at IBM in Boca Raton at the dawn of the PC era, I was part of a group that was responsible for the maintainability of computers and other hardware of all types. One task was to ensure that equipment broke very infrequently and that, when it did, it was easy to repair. I learned some interesting things about the effects of heat on the life of computers while I was there.
Let's go back to the light bulb because it is an easily visible, if somewhat infrequent example.
Every time a light bulb is turned on, an electric current surges into the filament and heats its surface very rapidly from room temperature to about 4,600° F (the exact temperature depends upon the wattage of the bulb and the ambient temperature). This thermal shock causes stress by vaporizing the filament's metal and the rapid expansion of the metal caused by the heating. When a light bulb is turned off, the thermal shock is repeated, though less severely, during the cooling phase as the filament shrinks. The more times a bulb is cycled on and off, the more the effects of this thermal shock accumulate.
The primary effect of thermal shock is that some small parts of the filament—usually due to minute manufacturing variances—tend to become hotter than other parts. This causes the metal at those points to vaporize faster, making the filament even weaker at that point and more susceptible to rapid overheating in subsequent power-on cycles. Eventually, the last of the metal vaporizes when the bulb is turned on, and the filament dies in a very bright flash.
The electrical circuitry in computers is much like the filament in a light bulb. Repeated heating and cooling cycles damage the computer's internal electronic components just as the light bulb's filament was damaged over time. Over many years of testing, researchers have discovered that more damage is done by repeated power on and off cycles than by leaving the devices on all the time.
The energy cost of manufacturing
The cost of a computer that is damaged by thermal shock includes the energy cost to build a new one or to replace the damaged parts. Network World published an article in 2011, Computer factories eat way more energy than running the devices they build. In it, they cited a study showing that as much as 70% of the energy consumed by a laptop is the cost of manufacturing it and the other 30% is the cost of running it. Note: The links to the cited paper are no longer valid.
Extending the useful life of a computer increases the overall return on investment. Reducing the thermal stress on the device by running it 24/7 can significantly extend its useful life.
So what breaks?
I have been running computers of various types 24/7 for over 30 years, and some have broken. However, the interesting thing is not that they did, but that they did so infrequently and what parts broke.
I cannot even remember how many computers I have worked with. I have supported so many that the number is impossible to calculate. But I can do a quick estimate of the hardware failures I have experienced.
I have had only one motherboard memory DIMM fail. Those components just don't fail. Well, except for that one.
Two motherboards failed when the Parallel ATA port stopped working, but I circumvented that by installing a PATA adapter. One motherboard just quit working with no response of any kind.
I have replaced five CD/DVD drives due to broken moving mechanical parts.
I have had at best estimate at least 14 power supplies fail. The frequency of these failures diminished once I started installing power supplies with significantly higher capacity than was actually required. Running a power supply near its design limit stresses it and causes it to fail sooner.
I have replaced about 30 failed hard drives over the years. Hard drives have moving mechanical parts, and those are usually but not always the failure points. I have not used SSDs long enough to have one fail.
Fans. Dozens of fans. Fans fail all the time. Sometimes they make noise when the bearings start to fail, but other times they just start slowing down as the gunk in the bearings starts to harden. This example includes case fans, CPU cooling fans, and GPU onboard cooling fans. I have not yet had a liquid cooling pump fail. Fortunately, the CPU fans failed noisily, so I knew to replace them before they stopped turning completely. Bad bearings seem to be the most common cause of fan failures, and they are noisy when they do so. Accumulated gunk failures are quiet and may not be noticed until something more significant fails.
What should I do?
I have read many recommendations about whether to run computers 24/7. Most of them seem to be aimed at home users, and the guidance they provide is appropriate for that environment.
Based on my experiences, and this Lifewire article, Should You Shut Down a Computer When It's Not in Use?, it is clear that most devices either fail quickly or after a very long life. The "bathtub" curve shown in the article illustrates this very nicely. The author recognizes the role of power cycles in causing failures, so he recommends that you power cycle your device when it is not needed early in its life to ensure that, if it fails early, it fails during the warranty. He says that once past the warranty period, computers should be run 24/7 because it is less stressful on the components than power cycling each day.
Near the end of that article, the author lists ways to extend the computer's life if it is run 24/7. This advice includes preventing the machine from entering hibernation or sleep modes as these can be almost as damaging as power cycles—which they essentially are. I would add that "wake on LAN" to do updates and backups is also a sleep or hibernation mode and counts for a power cycle. It is better just to leave the computer on full rather than allow it to sleep or hibernate.
Be aware that not all failures are caused by heat and power cycling. The graph in the article shows a straight line near the bottom of the curve that illustrates "random" failures that happen equally distributed along the timeline.
I run my computers the same way that data centers do—24/7 for their entire lifetimes—full-on power with no sleep or hibernation. I have had only one or two early-life failures. Here is what I recommend: Turn it on, turn off hibernation and sleep modes, and just let it run.
[ Want to test your sysadmin skills? Take a skills assessment today. ]