Ddrescue to the rescue

September 20, 2014 Leave a comment

A few weeks back, thanks to the blue screen caused by Microsoft’s batch of faulty updates, I formatted a teacher’s class computer and redid it from scratch – this was before I managed to find the work around to fix the blue screen issues. The computer was running fine since then, until this past week. The teacher started complaining bitterly about how slow the PC had become. I checked for malware, as well as for any other crappy software that may have been causing the slow down. I found nothing. I asked the teacher to monitor the PC, while I investigated further.

A few days later, the teacher was even more frustrated with the machine. Now it was taking forever to start up, shut down and was hanging on applications. I looked through Event Viewer, only to discover ATAPI errors were being logged. Not just one either, there were dozens of errors. The moment I saw this, I knew that the hard drive was on the way out. While the SATA port could be faulty or even the cable, the odds of those being the culprits were rather low. Too many bad experiences in the past have taught me that it is almost always the drive at fault.

I procured a spare drive and decided the quickest fix was to simply clone one drive to the other. Using Clonezilla I tried to do the clone. On my first pass, about 75% of the way through the PC looked like it went to sleep and I couldn’t see any output on the monitor. I couldn’t revive the PC, so I rebooted and tried the procedure again. This time, it got up to about 97.5% before it crashed out. Based on what I saw, Clonezilla was hitting bad sectors, corrupt files or the mechanical weakness in the drive. Now I was getting worried, because any more cloning attempts could hasten the end of the faulty drive. Not only that, it was wasting time. Setting up the PC from scratch again was my last resort, since it would take hours. Before I gave up and did that, I remembered Ddrescue.

I had tried to use Ddrescue on my home computer more than a year ago when the hard drive holding my Windows 8 install died. Sadly, that drive was too damaged even for Ddrescue to be able to save. I was hoping that this hard drive of the teacher hadn’t yet hit that stage.

I ran Ddrescue and then waited as the drive literally copied itself sector by sector over to the new drive. What I wasn’t aware of is that Ddrescue doesn’t understand file systems – it just copies raw data from one drive to another. This means it will copy any file system, but in order to do so, it must copy every block on the disk. A tool like Clonezilla will understand a file system and only copy used data blocks, therefore saving lots of time by not copying essentially blank space.

Ddrescue did hit one patch of bad data, but was able to continue going, then came back at the end to try and pull out what it could. Thankfully, whatever bad data there was wasn’t too major, and Ddrescue completed successfully. Booting from the new drive was a success, and best of all, the speed was back again. I did run a sfc /scannow at the command prompt to check for any potential corrupt system files. SFC did say it fixed some errors, and I rebooted. Apart from that, it looks like I managed to save this system in the nick of time. The old hard drive was still under warranty, and has been returned to the supplier. He can return that drive and get a replacement for us, which will become a new hot spare for some other classroom.

When Windows Update goes wrong

Windows Update is usually a very reliable method of keeping Windows based computers up to date. Rough in the early days, it’s come a long way since then. Smooth and mostly transparent in the background, it isn’t often that bad updates slip through.

Unfortunately, during after August’s Patch Tuesday, such an event occurred. After a number of updates were either automatically approved or approved by myself, we had some computers blue screen and go into a reboot loop. Thankfully, out of almost 180 computers, only 5 have suffered the problem seen below:

WP_20140814_001

All of the affected computers were running Windows 7 x64 SP1 with all updates applied. The first 3 times this happened, I couldn’t find a cure for the problem and ended up wiping and redoing the computer from scratch. Later in the week, I found some instructions online on how to get out of the loop and get back into working order.

  1. Get into the Recovery Console either from install media or by letting the Repair your Computer wizard run after a number of crashes.
  2. Open up a Command Prompt and delete the FNTCACHE.DAT file located in C:\Windows\System32
  3. Reboot the computer, and you should now be able to get back into Windows.
  4. Delete the FNTCACHE.DAT file again, as it will have been recreated by Windows.
  5. Lastly, go to Windows Update in the Control Panel, then view Installed Updates. Remove KB2982791 and optionally KB2970228. The other 2 updates mentioned out there on the web only apply to Windows 8.1/Server 2012 and so are irrelevant to Windows 7 computers.
  6. Reboot after the patches are removed.
  7. As I said, it’s not often anymore that bad updates slip through all of Microsoft’s testing, but it does happen. Although it’s frustrating, I don’t intend to modify how I approve patches. I’d rather take the risk of something like this happening than get hammered by Alureon or Conficker or some other nasty because I ignored security patches.

Firmware update fun

A couple of years ago, flashing any device’s firmware was often a difficult, frustrating and sometimes downright dangerous task. Always hoping that the device wouldn’t get bricked due to some unknown bug in the firmware, or worse still, a power failure right in the middle of the flash.

These days for things like motherboards, it can be as easy as flashing inside Windows, or using the built in feature on the motherboard. Generally speaking, you no longer have to use MS-DOS and try to find floppy disks or use an alternative, it just works. Intel motherboards in particular are usually very straight forward when it comes to this: run the Express Update inside Windows. Windows reboots, the motherboard flashes itself, reboots and you are back into Windows. No other intervention required.

Thus it was a bit irritating a few weeks ago when I decided to flash some of Z68 motherboards to their latest (and last) BIOS version. I ran the Express Update inside Windows as I’ve done countless other times. Computer reboots, fails to flash the firmware and then goes back into Windows. No matter what I tried, the firmware would not update. My next step was to download the *.bio file from Intel’s website, place it on a flash drive and press F7 during boot, so that I can update the BIOS. This didn’t work as well:

WP_20140715_002

That leaves me only one option – use Intel’s Iflash tool. I don’t have a copy of MS-DOS lying around, and I didn’t feel like going through many hoops just to get a flash drive set up correctly. I discovered that Iflash works with FreeDOS, so I simply placed the files on a flash drive I have set up with Ultimate Boot CD, which includes FreeDOS. Run Iflash, the computer reboots, but then sits for a while doing nothing. I was about to reset the computer when I noticed the power led on the computer doing a slow pulse. I remembered that Intel motherboards generally do this when updating the firmware or when in sleep mode, so I let the process go on. Sure enough, after about 3 minutes, the computer rebooted by itself. The latest BIOS was now installed and working correctly.

Thankfully there was only about 5 computers to do this on. I’m not sure why this model motherboard was so fussy, but it’s done now.

Fixing Windows Update issues

About three weeks ago, I approved a number of updates to be downloaded into WSUS for distribution on the school network. Among those updates was an update for the Windows Update client itself. I watched the WSUS console as the computers started reporting back and after a while I began to notice an odd pattern. 36 out of 39 computers in our main computer lab were not reporting in.

Taking a look at one of the affected computers in the lab, the cause of the computer not reporting in became clear: Windows Update Agent 7.6.7600.320 was failing to install repeatedly. Since this new version was required to download and install updated from WSUS, the computers would not be able to patch themselves until this Agent issue was fixed.

I tried numerous approaches to get the issue fixed: Uninstall anti-virus software, try installing updates at shutdown instead of through Windows Update in the Control Panel, run the System Update Readiness checker tool, run System File Check from the command prompt. Nothing worked. I was on the verge of preparing to wipe the lab and reimage the computers when I came across the answer.

Thanks to some vigorous internet scouring, I came across this Knowledge Base article on Microsoft’s website: http://support.microsoft.com/kb/2887535. Thankfully, the latest update agent was available to download right there from the article. I downloaded the 64 bit version and attempted to install the update manually on my affected lab computer. After a required reboot, I had success. Windows Update connected again and proceeded to download the now missing 17 updates and installed them. With this proving to be the solution, I went to each computer and installed the new update agent by hand. One by one, these computers were cured of the issue.

One computer however refused to install the updated agent. Checking the CBS Log file found in C:\Windows\Logs\CBS revealed that it thought it needed to be rebooted before updates could be installed. However, rebooting did not solve the problem. I’ve had issues in the past with Server 2008 where it got stuck on updates and needed a certain XML file to be deleted before it would boot again. Going to the location of the XML file, I couldn’t find the usual XML file. I did however find a reboot.xml file, which I viewed. This file pointed to a registry key that I assume was supposed to be deleted after the last round of updates. Since the key wasn’t deleted, the computer still thought it needed to be restarted. Deleting this key and rebooting solved the issue – I could now install the updated agent and install updates again.

At this point in time, I’m still not exactly sure why this lab of computers failed to install the update agent while the rest of the school did so without much fuss. About the only thing I can think of is that it’s somehow related to how the lab was cloned which was somehow causing an issue. Reading through the CBS logs doesn’t shed much light on the issue, since I don’t fully understand everything that’s captured in those log files.

I suppose this serves as a good reminder that while WSUS and Windows Updates in general normally just work, sometimes things can go wrong.

Cabling and De-cabling

The school I work at is 57 years old this year. This means that over the course of the life of the school buildings, lots and lots of cables have been installed for various systems. Discounting electrical cable, there are cables for the computer network, the burglar alarm system, the classroom and corridor intercom system, the old analogue and later PABX phone systems and even some analogue CCTV cables laying around somewhere in the roof.

After a recent venue reshuffle, my colleague and I have had the fairly rare chance to really go wild and rip out as much of the useless cable as we can find in the areas that were reshuffled. We decided not to just cut off the cable at the most convenient spot, but to follow it as much as is practically possible all the way back to its source. This method takes a lot more time but is a much more thorough cleansing than if we just snipped when it vanished from view.

As we’ve followed the cables and opened trunking, it’s become abundantly clear that cable was never ever removed in the past. The phone system in particular is an example. From what I can figure out and from talking to long time members of staff, it appears we once had an analogue phone system from our national phone company. There were no internal extensions, only direct lines all over the show. This meant that the phone company ran a lot of thick multi core cabling all over the show before using one or two pairs for the end jacks. We’ve discovered that many of these thick multi core cables simply taper off to a dead end point.

When the school got a Samsung analogue/digital PABX, the telecoms company that installed it simply ran their floor cables all over the show without removing the older cables first. Hundreds of meters of cable were run to support the Samsung PABX. Often cables were glued on the walls, in door frames or wherever else, often leading to a very messy appearance.

Two years ago, we moved to an Avaya VoIP system, which runs on the existing network cables. Although the telecoms company responsible for that system did rip out quite a bit, they didn’t bother going after the floor cables or anything like that. So for the last 2 years, we’ve been sitting with a lot of dead cable all over the show. When we eventually get to ripping it out, you end up with a pile like this:

20140704_113931

That whole pile is either multi core cable from the original phone system, or from the Samsung PABX. We had about 2 other piles of similar size when we cleaned up other venues, though those did include network cables and some power cable as well.

With the removal of so much cable, it becomes possible to install smaller and more compact trunking which not only looks neater, it also leads to easier cable management. Getting cables to stay in place while you hammer the cover back on of a 100mmx40mm piece of trunking is not an easy feat.

When I head back to work after my break, we’ll no doubt continue the hunt for cables and rip out as much of the dead stuff as we can find. I just wish it was easier to recycle these cables than it currently is.

Categories: General Tags: ,

Fixing Windows Update on Windows 7

Generally speaking, the Windows Update mechanism usually just works. Updates are downloaded and installed usually without much fuss. In the home environment, it’s pretty rare that things will go wrong, since computers at home are less likely to come under heavy use and abuse. Most of the time at work I have no issues with Windows Update, despite the pounding the computers take in the school environment.

Sometimes however Windows Update gets broken. Malware infection, powering a computer off during the update process and hard disk corruption are some of the most likely culprits. I’ve found myself fixing a few computers in the last week at work that have developed faulty update mechanisms.

To fix the problem, there’s two tools I’ve used. System File Check is built into Windows, while the System Update Readiness (SUR) tool can be downloaded from Microsoft’s website. The first port of call is to simply run sfc /scannow from an elevated command prompt and let it scan the system. I’ve found this to fix some problems, and it serves as a good stepping stone for step 2.

Step 2 involves running the SUR tool. The SUR tool looks like a stand alone Windows Update, though it is actually scanning your computer’s Component Base Store for corruption and either fixing the issues, or logging the issues it can’t fix into a very useful log file. Depending on the speed of your computer and the number of faults, the process could take up to 20 minutes to complete.

If the computer still refuses to install updates after step 2, it’s time to check the SUR log to find out exactly what is wrong. Navigate to C:\Windows\Logs\CBS\CheckSUR.log and find out exactly what files or packages are causing the problem.

In the case of the computers at work, all of them were missing certain manifest files out of the C:\Windows\WinSxS directory. To fix the issue, I copied the same manifest files from a working PC and placed them into the C:\Windows\Temp\CheckSur\WinSxS\Manifests folder and reran the SUR tool. Checking the log file after the tool had run indicated that all the remaining problems had been fixed. After that, the problematic updates installed without issue.

Research on the internet indicates that things can be a whole lot more corrupt that what I experienced, but thankfully I had it easy. There’s a nice long article on the SUR tool as well as how to analyse the logs on here.

Although it is frustrating to deal with Windows Update issues, the mechanism is largely robust enough that with a little time and effort, you can fix just about any problem. Certainly a far cry from Windows Update errors and troubleshooting in Windows XP!

Creating bootable USB drives using Rufus

With the seemingly slow decline of optical drives in computers, it’s becoming more and more common to install the OS via a bootable USB flash drive. I outlined a method of doing so using built in Windows tools way back in 2010. However, that method is little tedious and doesn’t make the flash drive capable of an UEFI based install, only legacy BIOS.

Enter a better way of doing things: Rufus.

Rufus

With an easy to use graphical interface, you can select all the options you’ll need to make a bootable flash drive. In particular, under “Partition scheme and target system type” you can select GPT as the partition type for an UEFI based install. At work, our brand new server doesn’t have a DVD drive, so this was the only way to install Windows Server onto the server in UEFI mode. No other tool could do that.

Make sure you have an ISO image of the disk you want to put onto the flash drive – Rufus doesn’t do a live capture from a physical disk unfortunately. You can even make a bootable MS-DOS based flash drive if you have the MS-DOS files, useful if you need to be able to flash an older computer’s BIOS or RAID card for example.

Add Rufus to the list of essential tools any administrator or technician should have in their toolkit.

Follow

Get every new post delivered to your Inbox.