It's hard to teach someone to swim when they're drowning

When we’re talking with prospective Gibraltar users, we often hear something that goes a bit like this:

Wow, that sounds like it’d be really useful, but we don’t have time to integrate something like that: we’re having all kinds of problems trying to get our latest project working at the client site.

Lifeguard StationIt’s one of the more painful experiences when advocating a solution:  They’re experiencing exactly the pain that Gibraltar is designed to help with, and with just a small amount of attention it could quickly help get them out of the hole they’re in.  The problem is, when someone’s in the middle of a critical issue, getting them to try out a new tool is a bit like trying to teach swimming lessons to someone that’s drowning.

When you’re going under, you don’t want to hear about how to kick effectively with your feet, correctly time your breathing, and coordinate your arms and legs.  You just want to be out of the water right now.  

Take a few swimming lessons

If you want to be prepared in an unexpected emergency, it’s easiest if you put in a little practice ahead of time when the pressure’s off.  With a few swimming lessons you’d be able to extricate yourself from the water with a lot less drama.  You don’t need to become a world class athlete, just moderately proficient.

Similarly, when it comes to being prepared to support your application once it leaves the developer’s desks doing something, anything, is better than nothing.  One of my favorite approaches is to push the development team to not use anything the client wouldn’t have when troubleshooting problems that show up during formal testing.  If you have to pull out Visual Studio on the test server to figure out why the web site is returning the wrong page, what would you do if this was on the client’s computer?  Ask them to install Visual Studio in production?

What you may need to do instead is insert more logging in the area around the problem to see if you can figure out what’s going wrong.  Once the additional logging has illuminated the issue (which may take a few iterations, often with it first showing what it isn’t more than what it is) don’t remove the logging, leave it in.  You may want to tune it slightly (get rid of some excessive verbose stages or reduce the severity of a few things) but now you have some battle tested logging in place for at least part of your code.

Often, problems that can show up in one place can also show up in another, so what other parts of your software have similar vulnerabilities?  How about dropping in a few log statements in those areas too?

Prepare for the Expected

One easy-to-introduce practice that can be surprisingly beneficial in the field is to log handled exceptions.  These are the errors that you’re expecting:  Perhaps you’re using an API that doesn’t let you check the validity of an operation before you try, so you try it, then if it fails you go a different way.  It might be that you try to open the file and if you get an exception then you assume it’s not there and create a new one.  

I’ve been on many projects where unexpected problems cropped up in exactly these places.  You’re expecting an exception at the location so it isn’t surprising to find one, and most people don’t bother verifying that the exception they caught is exactly the one they were expecting.  One practice we like to use is to put a low severity (verbose or informational) log statement in the catch that indicates it was an expected exception, and what we’re taking it to mean.   That way, if you’re reading the log and see a message that you were expecting an exception meaning the file wasn’t found, but you see that the actual exception was Access Denied you know you’re on to something.

Here’s an actual example from the Gibraltar Analyst codebase:

try
{
m_Parameter.Value = m_Parameter.ConvertValue(txtParameterValue.Text);
}
catch(Exception ex)
{
//even though we expect exceptions,log it in case it isn't why we expect.
Log.TraceInformation(ex, "User provided invalid parameter value/r/n" +
"Unable to convert to native type.  The new value will be ignored.");
}

That said, you should strive to eliminate all expected exceptions you can.  Not only are they problematic from a reliability standpoint, exceptions are very slow compared to other routine operations in code.  Finally, if your code is throwing exceptions even in normal use then it isn’t feasible to debug with Visual Studio set to break on thrown managed exceptions.  That’s a pity because this is a great debugging aide for cleanly setup code because it halts the debugger as the exception is being thrown before it has changed the location to where it will be caught, even allowing you to unroll the exception and back up.

Rock solid centralized logging

Unlimited applications, unlimited errors, scalable from solo startup to enterprise.