|So last week was a bit of a nightmare computer-wise. I wanted to get things here off to a good start, but my computer had other ideas. Lemme explain:
I bought my current hard drive (Seagate 1.5TB) around the beginning of last year. It was shiny and had received great reviews at the time. (Yes, Seagate’s reputation has been unsavory of late, but this acquisition was made before it started circling the drain).
For the past year I’ve experienced infrequent (but still intermittent) lock-ups whereby my system will just… hang. Mouse will stop moving, keyboard will stop responding, and the song I’m listening to starts cycling over the last few notes. Anywhere from 30 seconds later to 5 minutes later, it suddenly comes back to life as if nothing happened.
Now, I’m a pretty tech-savvy guy, so my first stop is the Event Viewer application to see what the heck just happened to my System. To my chagrin, I saw Warnings from a device called nvstor64 that looked like:
Reset to device, \Device\RaidPort0, was issued.
What’s more, i filtered the list and saw a bunch of these dating back several months to when I installed Windows 7. Now, I’m not about to blame Win7, since I know these issues were happening on Vista. Also, this is clearly an attempt by the Operating System to fix what is most likely a hardware problem (Windows talks to drive, drive doesn’t talk back, Windows waits patiently… then resets the drive to try to "wake it up"). I’m also experienced enough to know that "nvstor64" is the 64-bit NVIDIA Storage Driver, i.e., the thing that Windows uses to talk to the storage devices (hard drives, optical disc drives, etc.), which gives me a very strong hint as to who has a problem… but not why.
In the past this was happening only once in a while and I wound up sending my old motherboard (nForce 780i) in for a warrantee replacement. (I would like to give a shout out to evga for their amazing customer service). I installed the 750i motherboard as a hold-over until the 780i arrived… but when it did I was a few months away from getting married and was barely looking at my computer. No time to do heart surgery. So I’d been using the 750i faithfully since then.
I thought the lock-ups were a thing of the past, but last week I also endeavored to move the PC from the basement to the upstairs for the winter. All was going well and I considered my task finished until the lockups came back with a vengeance. To make matters worse, I was 3 days into a programming project for work and I hadn’t backed up to my Live Mesh or any external physical media yet. The problem needed to be fixed and fixed right then!
First step: Update the Motherboard (nForce 750i) drivers.
A bug in the driver could cause wonky hard drive controller issues, after all. I applied them (including the Storage Driver), rebooted, and hoped for the best. I was not prepared for what I was met with when the reboot finished. I opened the System Event Log and could almost watch the nvstor64 warnings roll in. Without fail, every 30 seconds a new one. Thankfully they weren’t stun-locking my user experience, but there they were, in spades.
"That can’t be good." I tell myself, and start poking around the ‘net. I run across this post (and several others like it) that let me know I am not alone. Eventually I wound up thinking that the increase in errors meant that the drivers weren’t as compatible with Windows 7 as NVidia and Microsoft seemed to think they were when they got WHQL signed. So I set about removing them.
Second step: Remove the updated NVIDIA nForce Storage Driver.
This was accomplished via six steps. If you’re following along at home, here’s what you do:
- Download the latest NVIDIA nForce Drivers (or, if you want to roll back to a previous version, have it handy).
- Download & install Driver Sweeper (you gotta do this first because after step 2, your internet won’t worky).
- Uninstall everything that says ‘NVidia’ via the Windows built-in Add/Remove Programs wizard. Don’t reboot until you’re all done installing them all.
- Reboot the PC and go into Safe Mode (spam F8 as the system is starting to get the menu that lets you choose this).
- Once in Safe Mode, log in as an Administrator-privileged account and run Driver Sweeper. Tell it to clean out anything from NVidia. Once it’s done, reboot.
- When the reboot is complete, re-install the NVIDIA nForce Drivers, but don’t check the Storage Driver. This should get your network ports active & functional again.
You’ll also need to re-install your graphics card drivers, but I make no assumptions that you’re using an NVIDIA card. I’m running a pair of 8800GTs myself, and they were happy to get squeaky-clean drivers after the total driver strip-down described above.
At this point, I figured my troubles would be over. Or at least, things would go back to "normal" (stun-locks and all). I was right, except that I was also wrong. Things were much as before, except they were much, much worse. I tried resuming my programming, but after hitting save 20 minutes into my work, I experienced a heart-stopping stun-lock that lasted for a good five minutes (but it felt like 5 hours). Eventually, the system recovered and my data was saved, but that was a highly-anxious handful of minutes I dared not repeat.
For sanity’s sake, I packaged up my code and backed it up to my Live Mesh. At least there it would be safe until I could figure out what the heck was causing this. After completing the backup, the machine locked up again. "Screw this!" I said under my breath and flicked the Reset switch on my computer’s case (a cold, black behemoth known as the CoolerMaster CMStacker).
Following the POST and the initial BIOS rigamarole, I was presented with every geek’s worst nightmare:
BOOT DISK NOT FOUND. PLEASE INSERT BOOT DISK AND PRESS ANY KEY TO CONTINUE.
Exsqueeze me? I assure you, my good sir, the boot disk is in the computer. I didn’t see any black smoke escaping or smell the crisp pungency of ozone accompanying my reboot, which by deduction I assert that the disk and its contents are in fact intact and still remain in the machine! I press the Power button this time and give the system a chance to cool. Some things you just can’t solve on a warm reboot.
After sending my PC into a power-free time-out to pore over its recent bad behavior, I head downstairs to locate my copies of SeaTools and SpinRite.
Third step: Scan the drive for errors.
Yeah, the controller chip on the motherboard might be going bad, but it’s hard to prove that without verifying that the physical drive is fine. So I ran the SeaTools Short Test to see if there were any glaring issues. Passed 100%, and the SMART status hadn’t been tripped. To be extra sure, I scheduled the Long Test and went to enjoy some time with Becca. When the test was finished, it also reported 100% pass. So, probably not the drive.
Fourth step: Replace the motherboard.
Rebooted the PC. This time i got a different error when Windows tried to start. It said something to the effect of "A disk read error occurred. Press Ctrl+Alt+Del to restart." Happened twice in a row. A full power-cycle fixed it, but it gave me the heebie jeebies. Felt like the drive (or the controller) was going downhill fast.
To rule out the controller, I finally unboxed my replacement 780i board and swapped it in for the 750i that was having the panic attacks. After an hour of trouble-free computing, I thought all was well… and then the system locked up on me again. Reason? You guessed it: nvstor64, device reset. Grrrr….
Back to the Internets for more helps. I changed tactics and started searching for my drive’s serial number in conjunction with the nvstor64 error and the disk read error and I eventually ran across this article which led me to this AnandTech article which indicated that flashing my ailing drive would woo it back to health.
Fifth step: Flash the drive’s firmware!
So I ran the utility for my ST31500341AS drive. Quick, easy, painless (though it is kinda cool — uses some kind of Acronis dynamic boot-partition creation magic that lets you do low-level things without floppies or CDs by creating a tiny temporary bootable partition on your hard drive, then removes all trace that it was ever there. Sneaky. I like it… just as long as it doesn’t fail and leave you stranded in no-OS land. Hah!)
Once Windows was back up, I cleaned out the Windows System Event Log and ran the best stress test I could think of: I played Borderlands for a few hours. (It was locking up regularly before).
Sixth step: Rejoice and give thanks!
Seagate’s firmware update seems to have fixed the issue for good. It’s been several days and I’ve had nary a lock-up despite my best efforts to force one. I’d like to thank:
- God for getting my data and my sanity through this ordeal intact.
- Seagate for responding to the problems of their own making.
- The Internet community at large and AnandTech in particularly for being doggedly persistent with them over the issues with these drive firmwares!
I guess you really do learn something new every day. First time in 15+ years of building systems that I’ve ever had a hard drive need to have its firmware flashed. And I hope I never have to again! Like Kaylee said, "Sometimes a thing gets broke, can’t be fixed." I’m glad this wasn’t that time.