Wednesday, January 3, 2007

The key to fixing problems quickly

In an ideal world, problems are solved instantaneously, or don't even arise in the first place. Here in the real world though, the key to fixing problems quickly is having enough information. I'm particularly frustrated with developers who, when an error occurs and they need to inform the user, come up with a really useless error message, something about as useful as

"error MSB3152: The install location for prerequisites has not been set to 'component vendor's web site' and the file 'DotNetFX\instmsia.exe' in item '.NET Framework 2.0' can not be located on disk. See Help for more information."

See help for more information.

To cap it all off, the online help is about as useful as matches to an astronaut. It actually says: "To correct this error - Determine whether the file exists on disk". The file does indeed exist on disk. It exists on disks all over the world. It even exists on both disks in the machine in question. It just doesn't appear to exist in the place where the program is looking for it.

I wouldn't need to see help for more information if the stupid program would just tell me where it was looking for the files. This is happening on my build server - the machine I'm setting up to monitor our subversion repository and republish this particular program every time someone checks something into the source tree. Up until now, this has been a manual process triggered by my colleague Karin using an existing Nant script which I threw together a couple of months ago. What this basically means is that all the hard work has been done already, I just need to hook up the monitor part (CruiseControl.NET) to the source control part (Subversion) and the build automation part (NAnt) and the ClickOnce technolgy (MSBuild). But somewhere along the line, the build machine is configured differently to Karin's machine, and MSBuild can't find the prerequisite files.

No problem, easy to fix, right? I'll just copy the files off Karin's machine onto the build server, presumably to the same path (they're just install files for the .Net Framework 2.0 after all), and away we go. Not so. Apparently there's some other place on this particular machine that those files need to be for MSBuild to pick them up - which in theory I'm fine with. There could be any one of a number of reasons for this - the source files and the MSBuild files are on different drive letters, Windows Server 2003 vs. Windows XP, etc, and I don't really care where the files are stored on the build server, as long as I get my automated build published every time someone changes something.

What does bug me is that when MSBuild fails inside of my NAnt task, it fails with that wonderful error message that I copy/pasted above. This is literally a 5 minute fix if they just put all the locations where they search for the files into the error message. Then, all I have to do is go and paste the files into one of those locations and I'm good to go. Or, at least then I can start to hunt around and see where those locations are stored and try and figure out why they're different on the build server and Karin's machine. Instead, I've spent the last two hours moving files all over my hard drives, trying to get MSBuild and ClickOnce to pick them up, to no avail - and I'm still no closer to a solution than I was when I started.

So my advice to programmers: Don't be scared to put as much information as you have into error messages - especially stack traces if you have them - but any other useful information as well. At least if you provide a decent message, then a competent user (which I know probably only accounts for about 3% of all your users) can at least do basic troubleshooting himself, without having to either:

  • Trouble your tech support with a problem that they will probably be no better equipped to solve than he is (because tech support also might not know where this particular configuration is looking for files).
  • Spend hour after frustrating hour scouring the web for solutions - this is NOT a great way to build customer relations and loyalty.

As an aside (and proof that I'm not overly anti-Microsoft), the system we're currently replacing (that we wrote) has an error message which pops up when something goes wrong on login and it looks like this:


This is an exact quote, even down to SHOUTING AT OUR USERS in ALL-CAPS. What this excellent piece of information basically means is either you typed the password in incorrectly, you don't have rights to send in a login message, or the date you passed through in your login message doesn't correspond with the server's date. Why all three error conditions are reported together I'll never know, but clearly providing too much information is almost as bad as providing too little (at least the problem can be accurately troubleshooted).

Every time this error comes up in our test environment, we need to open the management system and check the password, check the rights and then log into the server (which is Novell, not Windows) check that the server's date is the same as the client's date. Why not, as the developer who wrote this code, just take 5 minutes out of your busy schedule to return three different error codes (don't even get me started on error codes vs. exceptions {I'm pro-exceptions}, but this is an old C system), for the three different errors? That simple, once off 5 minute task would have saved 5 minutes of troubleshooting for probably 200 logins over the course of the last 2 years, which equates to around 800 minutes or almost 14 hours of time saved.

I attribute such techniques partly to laziness, but also partly to something I come across all the time - thinking strictly in terms of very short-term goals. "I have to get this login routine finished today" as opposed to "This needs to be as easy to use and troubleshoot as possible, because every single person who logs into the system in its entire lifetime is going to have to pass these checks, and if their login fails, it's going to frustrate every single one of them."

However, a discussion of short-term goals - of thinking too narrowly - is a discussion for another day.

No comments: