Improved Event Matching with Automatic Message Redaction
At the core of Loupe Monitor and Resolve is the ability to group error messages that represent the same underlying problem together. As log messages arrive at the server, Loupe analyzes them and groups together ones that share the same signature into an Event. In Loupe Monitor, error events generate Alerts and in Loupe Resolve they go into the review queue so you can decide if they’re defects or not.
One of the biggest challenges Loupe faces is handling log messages that have unique data embedded in them. For example, some common exceptions embed helpful information directly in the message of the exception that are almost always unique, like this:
The process cannot access the file 'C:\Users\kendall\AppData\Local\Temp\0hktekc3.tmp'
because it is being used by another process.
It’s too much to expect a developer to rewrite every exception message to exclude unique data in the first part of the log message, but Loupe likewise can’t just group all exceptions of the same type together in the same source location as the same problem - this would tend to hide distinct instances of common exceptions.
The situation has gotten more urgent in the past few years with newer log messages and exceptions emitted by ASP.NET Core and other new .NET libraries including more and more unique data in the first part of the message or exception. While helpful for debugging, we saw the number of unique events climb dramatically with our customers that adopted .NET Core / .NET 6 (and later).
Previously: Using Redaction Rules
Loupe has supported redaction rules for years. These let you select a log message that has unique values, mark the parts you want to redact and what to swap in for the value, and then it applies this rule across all of your data.
While this works, it’s onerous to wait for them to show up and then set up rules for each one. We use Loupe ourselves and have always wanted a better way.
New: Automatic Redaction of Common Data
Loupe 5 introduces automatic log message redaction to remove common unique values from log messages before it calculates the signature so they will group together. This new approach automatically redacts the example above into:
The process cannot access the file '{Path}'
because it is being used by another process.
Loupe records the redacted value on the Log Event but the full, unredacted value is still in the raw log data so when you look at specific occurrences of an event you can still see the original, unmodified value.
What will Loupe Redact?
Loupe scans messages looking for:
- GUIDs: The most common unique value early in a log message, often used for generating tracing Ids or just value keys.
- Timestamps, Dates, and Times: In a range of date/time formats.
- Durations: like “0.0004ms” or “00:00:59.9964814” (and a range of other formats)
- IP Addresses
- Urls
- File Paths
- Versions: like “12.0.1” or “v12.0.1”
- Hexadecimal values
- Numbers
For example, here’s a real customer log message we got as a test case:
The TransientConfig state of run {Event=59c4a4fa-fb01-40dd-8db7-2122444bff1b/2023-03-21 10:34:02.350,
Run=c5a10de4-83a7-40a2-a38e-0ac1a9986d46} was modified during loading.
Loupe’s new redaction rules automatically change that to:
The TransientConfig state of run {Event={Guid}/{Timestamp},
Run={Guid}} was modified during loading.
That groups together nicely - and helps your dev team have a single event to resolve.
Automatic redaction work along with user-defined redaction rules - if a log message matches a redaction rule you’ve configured, the message will not be run through automatic redaction. We’re confident if you took the time to go through and configure a rule, you redacted everything you wanted to!
Real World Results
We’ve been testing automatic redaction with Loupe Cloud-Hosted customers to find the right balance between redacting all the things we should without accidentally taking out useful data and causing unrelated things to be grouped together. The initial results are very encouraging - we’ve seen reduction in unique error signatures of up to 99% for customers with a large number of events. One customer saw it reduce the number of signatures down to 950 from 12,000. It then only took them an hour to run through the 950, categorize them, and discover a number of key defects that have been affecting their customers - but hiding in the mass of data.
With our latest update, it’s enabled by default for new customers and we’re rolling it out incrementally to existing customers (along with communication so there’s no surprises!)