How can I Manage Application Errors?
Once you’ve deployed your application you want to know that it’s working right, performing well for your users. But, it’s just not possible to anticipate every scenario up front - so errors and exceptions will happen. When they do, you need to know quickly, understand what’s happened, and be able to prioritize them.
How Many Unique Errors or Exceptions?
If your application has recorded 200 exceptions, you need to know if it’s:
- One Exception 200 Times
- Two Hundred Unique Exceptions
- A Mix of Both
How you address the problem often depends on how many distinct problems there are - Having a single miss-step generate a flood of identical errors is common, so you can’t just look at how many errors happened or how often a particular error happened. Instead, the first step to managing errors is to determine how many distinct problems there are. By grouping common errors together it gets easier to determine which ones to tackle first:
- By Greatest Impact - The ones that affect the most users or the most computers are usually the best candidates to be addressed first. These will give the greatest impact to your userbase.
- By First Occurrence - New problems that have just started happening should have a higher priority because they tend to be easier to solve (they were just introduced, so the change that introduced them is smaller) and conversely errors that have been around a long time are likely not as bad for your end users because they’ve been around a long time and not been escalated.
By focusing on distinct errors, not all errors, we can avoid losing the forest for the trees - it’s easy for one high volume problem that’s screaming into your logs to hide 10 small problems that are affecting more users.
How To Fix Application Errors
Modern exceptions do a good job of communicating what went wrong at the moment the application could no longer keep processing a request, but they can’t capture the context of execution that lead up to the problem. Roughly half the time this is OK - a FileNotFoundException is often clear enough that a developer can work out what went wrong with no additional information.
The rest of the time you need to know how the application got into the state where the exception was thrown - what lead up to the exception. For example, a NullReferenceException can often be baffling because it isn’t clear how the code could progress to the point it did without the value being set. It may take walking back the current activity a significant ways to realize that a special case or scenario got missed.
To solve these problems it’s important to have the log data leading up to the error, not just the exception itself. This doesn’t need to be the detailed value of each variable or operation to be effective, just enough to illuminate how the code was proceeding and what records it was working with. At the same time, on a busy server you need to untangle the activity that’s executing from other activities going on at the same time so you can clearly understand the flow. It’s a bit like following the conversation of a single person talking in a crowded room.
Tracking Errors to Fix
As your applicaiton runs and information about distinct errors is accumulated, you’ll want to periodically review the open problems and decided:
- Which Errors are Noise - Some application errors may represent normal, or uncorrectable issues like a user entering a bad password or disconnecting in the middle of a download. Yes, the software reported an error but no, there’s nothing to do about them. You want to *suppress this nose so it’s off your metrics and you don’t keep reviewing it.
- Which Errors are New? - Any new error you’ve never reviewed before is a top candidate to be checked and reviewed. Then you can decide whether it’s noise or a problem hat needs to be fixed.
- Which Errors are Bugs? - If an error looks like it’s really a problem with the application code - either a defect in the implementation or just a missing scenario that wasn’t accounted for - you want to open an issue and track the problem until it’s resolved. You don’t need to review it again once you’ve made that decision.
Once you’ve decided that an error is really a Issue to be addressed by a developer you may want to track it within your existing work item tracking system (like JIRA or Azure DevOps) or keep a separate list. Many teams decide to track runtime issues separately to have distinct metrics, workflow, and resource allocation compared to new development aimed at adding or changing functionality.
Managing Application Errors with a Team
Most developers work in teams, with many teams possibly contributing to the same application. Depending on the size of your team, you may have one or more people who routinely triage all errors as they come in and then assign issues that have been opened to other team members to resolve or even to other teams for review and resolution.
Loupe Resolve is a Complete Error Management Solution
Loupe Resolve tracks each distinct warning, error, and crash message and exception in your application across all systems and versions where your application runs. It automatically groups common errors together so you don’t miss problems due to a few very frequent errors. Resolve has a built-in error review queue with intelligent prioritization so you are lead to the most important problems ot investigate automatically.
Once you’ve decided an error requires a code change, Loupe Resolve will create an Issue to track the problem (and any number of other related problems you identify). You can assign these Issues to teammates and use Loupe Resolve’s built in Issue management capability or link them into your Work Item tracking system.
Loupe Resolve is application version aware so it knows the oldest and newest version an Issue has happened in. This helps you track how your overall runtime quality is trending release over release and lets Loupe automatically re-open Issues if they reappear in a later version than they were reported fixed in.