To log or not to log…

The issue Logging is as old as software programming itself. Where thin paper strips left a typing machine those days, masses of log appenders are configured in todays’ Enterprise Application servers. The extensive use can soon become overwhelming due to the billion of log entries written to plain log files.

The topic of efficiently processing these output streams has become more important in the last years. Errors in highly parallel threads, maybe cloned or clustered over many machines doing the same task slightly different due to user interaction or input data, can reside well hidden in the dark, only showing up in complex situations. Simple log file scanning la Matrix most times will not reveal the red-dressed blonde. Manually chasing stack traces in plain, most times clustered log files ends up in searching for the String ERROR, and after that for the needle in the haysack.

Many tools address this downside. It’s most popular commercial product seems to be splunk, an open source alternative is GrayLog2. The first may be called expensive, both look very time-consuming at first sight, because they demand extra infrastructure and administration.

In one of our inhouse tech-labs, we tried to prototype a lean solution which just gets along with a minimal appender an external store, and a Web Frontend, all together inside the source application maybe. The result was a Grails frontend, a MongoDB for log storage and many cool ideas what can be analyzed in fact.

All starts with this log4j definition:

<appender name="mongo" 
  <param name="Threshold" value="DEBUG"/> 
  <param name="databaseName" value="logtopus-db"/> 
  <param name="host" value=""/> 
  <param name="applicationName" value="logtopus-example"/> 
  <param name="version" value="0.1-DEV"/> 
  <layout class="org.apache.log4j.PatternLayout"> 
    <param name="ConversionPattern" 
      value="%5p %d [%t] (%F:%L) - %m%n"/> 

The object behind stores every message document to a MongoDB, including some sophisticated values for later analysis. If your application already runs on MongoDB, you are just done.

example: overview chart of the different log levels of the messages stored.

We created some neat charts based on the stored messages. Google Charts soon got our favorite while trying to get easy-to-use and good-looking output.

The really interesting stuff comes up when thinking about what can be analyzed. We were also in touch with splunk before and felt the high degree of freedom it offers. But most times, log crawling can be reduced to some simple recurring questions: How does the log concern distribution look like? Were there any errors the last hour, day, week? Are there uncaught runtime exceptions?

example: log level distribution over the last hour

This view shows the log level distribution over the last hour. There is so much information in this rather simple chart. See yourself!

These questions can be answered with simple queries against the message store. Some quick Google charts together make a nice dashboard.

The fancy ideas arise when thinking about correlations between different fields of a log message. We thought of the software version or build number for example. In genreal, this information is easily available for example by reading the Manifest file or defining it via the version property in the appender. With this small tweak we are now able to track the logging behavior and application quality over several versions! And we can prove that the correct version is live, which is not that easy sometimes.

logtopus-ui view

Our small Grails GUI uses Twitter Bootstrap.

The logging of java code debug details like source code line and called method is heavy duty for the logger, but allows another correlation feature which leads to quality hotspot analysis on class or even method level at runtime. Think about the signature a stored log message can have, maybe consisting of class, package, version, log level and occured exception if existing. Counting similar or same incidents leads to a code heatmap which is useful for optimization and bug tracing.

To put all that in a nutshell, it is not vital to build your own custom logging toolchain, but it leads to new ideas of runtime analysis. And in the end we found it easier to extend our logging appender to prepare our data for specific queries, rather than trying to insert key words in our log messages and building complex view and reports on it after that. If you are interested in more details, feel free to check out our small prototype on GitHub including appender, test data creator and Grails Frontend.

Cheers Ben


6 thoughts on “To log or not to log…

  1. jbrandstettercs

    I like the idea of code heat maps. You could for example use a tree map to get an idea if you have some fishy code in your application!

  2. Pingback: Questa settimana in Grails (2012-35) - - Il blog di Luca Canducci: notizie, tips e nuove tecnologie dal mondo dell’IT.

  3. Pingback: An Army of Solipsists » Blog Archive » This Week in Grails (2012-35)

  4. Jeff Winkler

    Interesting. I’ve built a Grails analyzer tool to analyze&summarize exceptions – pulling out versions as you have, cleaning up exceptions to remove things like session ID’s, generated code OIDs, and fuzzing line #s. It works well but is not realtime.

    Have you seen LogStash? It’s a very versatile tool that uses ElasticSearch(Lucene) to parse log lines and store structured data. Some people are storing Billions (yes, billions) of events weekly with it. Demo: . Try searching for “twitter”, all time, then clicking on the fields on the top left to toggle display.

  5. Pingback: The MongoDB tool belt «

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s