Have you ever been frustrated with the tedious process of answering even simple questions using regular expressions over log file data?

In one of the early Serilog demos we wrote an event like:

20:08:43.232 [Information] Processed { Lat: 25, Long: 134.5 } in 34ms 

Given this log data, for example,

Which sensor inputs took more than 100ms to process?

Regular expression-based searching is great for matching textual patterns, but not so much fun when richer data types (if you can call numbers “rich”) are involved.

By combining Serilog with RavenDB, a commercial document database for .NET, the answer can be derived as simply as issuing the query:

MessageTemplate: "Processed {@SensorInput}*" AND Properties.Elapsed_Range: [Ix100 TO NULL] 

This query’s very specific about a) the event type, and b) the numeric range of the Elapsed property. Both of these aspects are tricky to replicate using traditional text-based logs.

Let’s walk through this scenario step-by-step.

Creating a database in RavenDB

If you’ve just downloaded RavenDB you’ll need to start a server. For quick and dirty experiments, the Start.cmd script in the root of the RavenDB zip file will run a local server bound to HTTP port 8080.

The Raven Studio web UI is the first place to go. I’ve created a database called Logs.

logs-database

There aren’t any documents in there yet; we’ll write those through Serilog’s RavenDB sink.

Configuring the RavenDB Sink

The Serilog.Sinks.RavenDB package has everything you need for getting log data into RavenDB.

PM> Install-Package Serilog.Sinks.RavenDB

In your application startup code (Main(), Application_Start() etc.) create a DocumentStore object describing the location of your logs database.

var documentStore = new DocumentStore
{
    Url = "http://localhost:8080",
    DefaultDatabase = "Logs"
};

Configure Serilog using the WriteTo.RavenDB() method.

Log.Logger = new LoggerConfiguration()
    .WriteTo.ColoredConsole()
    .WriteTo.RavenDB(documentStore)
    .CreateLogger();

That’s it! You’re ready to write some log events.

Writing to the log

If you’ve never used Serilog, there are only a couple of things you need to know up-front.

var sensorInput = new Position { Lat = 25, Long = 134.5 };
var elapsedMs = 34;
Log.Information("Processed {@SensorInput} in {Elapsed} ms", sensorInput, elapsedMs); 

First, log methods use named rather than positional arguments in format strings. SensorInput and Elapsed are the property names given to the sensorInput and elapsedMs values, respectively.

Second, to capture the structure of a log event property (instead of just capturing it with ToString()) prepend @ to the property name. This is a safeguard against accidentally serializing large object graphs. An article in the documentation has more detail on writing structured data.

After writing a few events, they can be seen in the Raven Studio interface.

raven-showing-logs

If we peek into one of the log events, the preserved structure of the event data can be seen.

{
  "Timestamp": "2013-06-18T17:49:47.7947070+10:00",
  "MessageTemplate": "Processed {@SensorInput} in {Elapsed:000} ms",
  "Level": "Information",
  "Exception": null,
  "RenderedMessage": "Processed Position { Lat: 24.7, Long: 132.2 } in 037 ms",
  "Properties": {
    "SensorInput": {
      "Lat": 24.7,
      "Long": 132.2,
      "$typeTag": "Position"
    },
    "Elapsed": 37,
    "ThreadId": 9,
    "MachineName": "NBLUMHARDT-RMBP"
  }
} 

RavenDB Dynamic Queries

The commonly-used query API on RavenDB is Linq, but the lesser-known Lucene syntax is apparently used as RavenDB’s internal representation.

Documentation is a little sparse but the syntax is quite intuitive after a few attempts. Jumping into the Indexes > Dynamic Query screen, our query finds the events we’re after.

raven-query

(I’ve had to drop the time threshold to 10ms to pick anything up in my data set.)

Concluding thoughts

Among the ‘NoSQL’ data stores I’ve tried so far as back-ends for Serilog, RavenDB seems to be a reasonably good fit for the job. What excites me most about the potential of Serilog is the ability to go ‘beyond grep’ and deal with log data in more useful types like numbers, dates and so-on. Through RavenDB we’ve been able to home in on a specific event type and a numeric range, and hardly broke a sweat.

Lucene itself is also very text-oriented, great for classic log searching scenarios. I’m not sure how much of that is surfaced by RavenDB, definitely something to explore.

Caveats

One thing to watch – the RavenDB sink writes events periodically, in batches on a background thread. In normal circumstances, Serilog will flush the current batch on shut-down of the AppDomain, so events will rarely be lost, if ever. Terminating the app “hard” by stopping the debugger may prevent some events from being written back to the database. Using remote sinks like RavenDB in conjunction with a local log file is recommended, anyway.

Also worth pointing out is that RavenDB is a sophisticated piece of software with specific capabilities and requirements. Dumping in and querying large volumes of log data will require some up-front planning and performance analysis that I haven’t done. If you’re using RavenDB in this way today I’d love to know more about the experience.

Acknowledgements

Thanks to Nick Van Eeckhout for figuring out and contributing the RavenDB sink!