Oh boy, where to begin? Hard drives man. They be killin' me sometimes. They hold all your precious data, programs, and in many cases, your entire digital life. They are also the most likely component to fail in your computer.
They fail for any number of reasons. Impact. Vibration. Heat. Moisture. They'll eventually go out from normal wear and tear. Sometimes, they fail for no reason at all, with no warning whatsoever. Failures can range from a total, unrecoverable loss to just a partial, recoverable loss where the data can still be salvaged but the drive needs replacing and software may or may not be recovered. The latter being the most common type of failure.
So what can you do to identify, prevent, and recover from a hard drive dying on you? Well, the obvious answer is back up. Back up early, back up often, and back up to at least two different locations, and once in a while, actually verify those backups. If you do at least 2 out of the 3 above, you're ahead of at least 99.5% of your typical computer user so feel free to pat yourself on the back! For the rest of you, you should at least get yourself some insight as to how your drive is doing health-wise. Your drive actually can give you fair warning when that's about to happen - that's where S.M.A.R.T. comes into play.
S.M.A.R.T stands for Self-Monitoring, Analysis and Reporting Technology, and if you really want to get into the nitty gritty of it, I suggest you read the Wiki Page
. Basically, it's here to tell you what's going on inside your hard drive. In the most imminent of failures you can get a message looking somewhat like this:
it means SMART is warning your computer that the drive is going to fail. Be forewarned though, those only happen in the most extreme cases, and I've seen drives that were completely unusable that SMART did not give any heads up on either.
The hard drive stores this information for a wide variety of criteria, and if you keep an eye on it, it can act as an early warning system for drive failure. The things we really keep an eye out for are the "Reallocated Sector Count" and the "Uncorrectable Sector Count." Now drives can function with bad sectors. It's not an ideal situation by any means, but something in the neighborhood of under 10 bad sectors seems to be tenable. Occasionally you can have quite a few more and not run into any trouble at all. Sometimes a single bad sector at the very start of the drive can render the whole drive unusable. In any case, being able to identify the problem is the first step.
After having diagnosed literally thousands of bad hard drives, as soon as there's even a suspicion that it may be a hard drive error, I want to get a look at the drive's SMART data. The easiest way I've found to do this is to use a bootable Linux disk on the suspect computer. Since most computers of the last half decade or so can easily boot off a USB drive, and putting Ubuntu Linux on a USB flash drive is as simple as following these instructions
. If you cannot boot from a USB flash drive, Linux can also be loaded from a CD. Once you're booted into Linux, you can easily launch their Disk Utility, click on SMART Data, and you'll see this screen:
So as you can see, the SMART data shows just a single bad sector but the drive is ready to fail completely. Conversely, I've seen a drive reporting upwards of 3.5 million bad sectors that SMART didn't flag as failing, but both of these drives needed replacement (and both were fully recovered I might add). This whole process takes a just few minutes to identify a hard drive in danger of failure, doesn't rely on the suspect hard drive to boot into the operating system (if it even still can at this point), and minimizes the risk of any permanent data loss.
Now you've probably seen Windows ability to check a hard drive for errors, but this process is far from ideal. A full exhaustive disk check can certainly rectify some limited errors on a drive, but in many cases can serve to exacerbate them. When a drive has shown ANY signs of failure, the last thing you want to put it through is an extended period of read/write activity, unless, of course, you don't care about the data on the drive. In that case scan away, but like this article
from InterData states, when trying to preserve the data on a suspected failing drive, the LAST thing you should be doing is putting it through extensive reads/writes.
So where does this leave us? OK, so now you know to back up your data. Every drive will fail eventually. What else? Oh, so here's a nifty way to monitor SMART Information from within Windows - a cool little app called Speedfan you can download here
. It's got a number of features, and a SMART tab that'll let you view that information on any of your installed hard drives. It's a great way to keep tabs on your disk's health, and it's totally free too. A lot of times if we can't get Linux to load on a customers PC, we'll remove the hard drive, hook it up to our recovery computer, and use Speed fan to get the SMART data. Thanks Speedfan!
Like anything, none of this is foolproof or 100% applicable. I haven't known a SMART reading to be incorrect, but drives with even 100's of errors can be scanned and made to function normally for the life of the computer. In other cases, drives with a relatively small number of errors end up needing a clean room recovery. It's unpredictable, and personally, I like to err on the side of caution because even the most expensive hard drive costs less than the cheapest data recovery.
I'm not a Geek. I'm your friend. And I'm here to help.
A very special thanks to C.C. for inspiring this post!
Labels: computer repair, data recovery, hard disk recovery, hard drives, Linux, SMART, Speedfan, Ubuntu, windows