First, Do No Harm - Designing Robust Infrastructure

Hippocrates - 460-377 BC

Hippocrates’ Primum non nocere, “First do no harm”

Several customers have requested a notification mechanism to be alerted when errors are detected in their programs.  Simply raising an event is straightforward, but our promise to our customers is that we’ll do the hard thinking that ensures Gibraltar is safe and robust in production systems.  Our mantra is: first, do no harm.

In this case, we asked ourselves questions like:

  • What if a customer’s error notification logic is slow?  How do we ensure that it doesn’t slow down the application as a whole?
  • What if the program starts screaming thousands of errors?  How do we ensure that we don’t swamp the error notification handler?
  • What if there are errors in the customer’s error notification handler?  What if it throws an exception?  What if it hangs?

This resulted in a design that ensures that the logging infrastructure (including Gibraltar itself AND customer logic that interfaces with it) will be robust and safe.

Our central Log object in Gibraltar Agent now has a MessageAlert event that is raised when warning, error, or critical messages are recorded.  This event has a number of safety features such as:

  • Asynchronous: The event is raised on a background thread that is not part of the logging path, ensuring that time spent handling the event will not slow down logging or affect other threads.
  • Batching: When a burst of messages are recorded that qualify they will typically be raised together to allow more efficient processing
  • Throttling: A minimum delay between events can be easily specified to ensure the event isn’t raised too frequently, particularly in error cascade scenarios.  Messages are batched up until the next time the event can be raised.
  • Hang Protection: If the event handler never returns the Agent will continue to process messages and not queue them, allowing them to be released from memory.
  • Loop Protection: Messages that are recorded by your event handler will not cause additional events to be raised.  This prevents notification loops where an event handler records an error during notification which subsequently causes the message alert notification to be raised again.
  • Low Overhead: We don’t spin up anything (the threading, queue, etc.) until someone subscribes to the event so if you don’t use this feature it doesn’t take up resources either.

The MessageAlert event is particularly useful for automatically triggering immediate data transmission in the case of an error and implementing your own error notification mechanism.  The full detail of each log message is available in the event.

Check out our recent post on [<div id="attachment_458" style="width: 279px" class="wp-caption alignright"> Hippocrates - 460-377 BC

Hippocrates’ Primum non nocere, “First do no harm”

</div>

Several customers have requested a notification mechanism to be alerted when errors are detected in their programs.  Simply raising an event is straightforward, but our promise to our customers is that we’ll do the hard thinking that ensures Gibraltar is safe and robust in production systems.  Our mantra is: first, do no harm.

In this case, we asked ourselves questions like:

  • What if a customer’s error notification logic is slow?  How do we ensure that it doesn’t slow down the application as a whole?
  • What if the program starts screaming thousands of errors?  How do we ensure that we don’t swamp the error notification handler?
  • What if there are errors in the customer’s error notification handler?  What if it throws an exception?  What if it hangs?

This resulted in a design that ensures that the logging infrastructure (including Gibraltar itself AND customer logic that interfaces with it) will be robust and safe.

Our central Log object in Gibraltar Agent now has a MessageAlert event that is raised when warning, error, or critical messages are recorded.  This event has a number of safety features such as:

  • Asynchronous: The event is raised on a background thread that is not part of the logging path, ensuring that time spent handling the event will not slow down logging or affect other threads.
  • Batching: When a burst of messages are recorded that qualify they will typically be raised together to allow more efficient processing
  • Throttling: A minimum delay between events can be easily specified to ensure the event isn’t raised too frequently, particularly in error cascade scenarios.  Messages are batched up until the next time the event can be raised.
  • Hang Protection: If the event handler never returns the Agent will continue to process messages and not queue them, allowing them to be released from memory.
  • Loop Protection: Messages that are recorded by your event handler will not cause additional events to be raised.  This prevents notification loops where an event handler records an error during notification which subsequently causes the message alert notification to be raised again.
  • Low Overhead: We don’t spin up anything (the threading, queue, etc.) until someone subscribes to the event so if you don’t use this feature it doesn’t take up resources either.

The MessageAlert event is particularly useful for automatically triggering immediate data transmission in the case of an error and implementing your own error notification mechanism.  The full detail of each log message is available in the event.

Check out our recent post on](/blog/cool-charting-enhancements-coming-in-gibraltar “Charting Enhancements Coming in Gibraltar”) for more examples of how we are incorporating customer feedback to ensure that Gibraltar provides a robust logging infrastructure allowing you to build rock solid .NET software.

Related Posts

Loupe 4.5 Released with New Log Viewer for Web

Rapidly diagnose each error in any .NET application with our new Web Log Viewer and Exception root cause analysis, new in Loupe 4.5. New integration with Azure Service Bus and Azure Search enables full Loupe functionality without any Virtual Servers in Azure. Read more

Cloudflare Vulnerability Does Not Affect Us

The recently reported Cloudflare vulnerability where fragments of secure, encrypted user data could be exposed to a third party does not affect Gibraltar Software even though we use Cloudflare because we only route static content through the Cloudflare proxy for acceleration. Read more

We're out of our Last Data Center

Back in January of 2016 we decided to completely transition out of our data centers and into the cloud. On Sunday we finally shut down the last cluster of our hardware. Read more for how we did it and whether we would do it all over again if we had... Read more

Rock solid centralized .NET logging

Unlimited applications, unlimited errors, starting at $25/month