Querying collection properties in Seq 1.5

Seq implements a filtering syntax that’s loosely based on C#. The following expression matches the events you’d expect it to, if you’re used to working with C# or a similar language:

Tag == "seq"

Which is to say: “Match events where the Tag property has the value "seq". Sub-properties of objects can be accessed using the familiar dotted syntax (Property.Subproperty.Subproperty) and so-on.

Seq 1.5, currently a preview, steps into some new territory.

What about a collection of tags?

It’s not uncommon for Serilog events to carry properties that are collections. For example we might log events on a Q&A site like:

Log.Information("User {Username} posted a question tagged {Tags}", username, tags);

Resulting in events like:

Event with Collection Property

Instead of being marked with a single tag, each event carries a collection of zero-or-more.

Matt Hamilton asks:

This is where Seq 1.4 draws a blank. Translating this to the Q&A example, about the best we can manage is a text search for "seq" and requiring the existence of the Tags property using Has(Tags).

This is a bit of a shame – the promise of Seq and structured logging in general is to make querying log events precise – so why aren’t collections covered?

The C# way

Seq isn’t only about precise queries; the whole foundation of the app is the idea that it should also be easy and natural to query log data.

I’d thought about collection queries quite a lot when putting the original filtering syntax together. C#-style the query would look something like:

Any(Tags, tag => tag == "seq")

Seq doesn’t use extension method-style calls, but this is pretty close to the Enumerable.Any() call you’d write using Linq.

I don’t like it!

In a full-featured programming language this syntax makes sense, but it’s too noisy, complex, and easy to mistype to make a good filtering syntax for Seq. And so, that’s where things have sat for the last twelve months.

Pattern matching

With a bit more thought, Matt and I both ended up thinking the problem might be better suited to a pattern-matching syntax like:

Tags[?] == "seq"

So, this is what Seq 1.5 provides. The question mark wildcard ? matches any element in the collection, while an asterisk * wildcard only matches if all elements satisfy the condition.

Questions tagged seq

Wildcards work in any comparison where one side is a property path (including indexers and dotted sub-properties). Multiple wildcards along the path are supported.

You can write the following expression to find questions where all answers are "Yes!":

Answers[*].Content == "Yes!"

(Under the hood, Seq rewrites these expressions into the equivalent lambda-style syntax – much the same way as the C# compiler transforms Linq query comprehensions. Working through implementing this reminded me just how fantastic the C# compiler really is!)

Try it out!

You can download a preview build of Seq 1.5 here. It’s not ready for production use yet, but should be just fine for dev/non-mission-critical servers. We’d love to hear what you think!

How (not) to parameterize Serilog events

Writing an event with Serilog is just like formatting a string:

Log.Information("The time is {Time}", DateTime.Now);

There’s one subtle and important difference though. The first parameter to Information(), here given the value "The time is {Time}" isn’t just an arbitrary string – it’s a Serilog message template.

You might be tempted occasionally to write events like:

Log.Information("The time is " + DateTime.Now);

Don’t do this. While never particularly good logging practise (the string concatenation occurs regardless of whether logging is enabled or not, wasting RAM and CPU cycles), with Serilog this is strongly considered an anti-pattern.

Why an anti-pattern?

The first example, using Serilog correctly, creates events that are logically like:

{
  "MessageTemplate": "The time is {Time}",
  "Properties": { "Time": "2014-09-11T09:35.55.000" }
}

The second, incorrect example will yield:

{
  "MessageTemplate": "The time is 2014-09-11T09:35.55.000",
  "Properties": {}
}

While you might not notice the difference initially when logging to the console or a text file, the first event is much more useful than the second:

  • Time is stored as a queryable property
  • All events of this type share the same message template, making the events queryable by type as well

Serilog is also optimised for creating well-formed events like the first one; while the second example won’t blow up, since each event has a unique “template” it will not take advantage of message template caching, requiring the message to be parsed each time. The extra junk in the message template cache will eventually cause the cache to flush, reducing logging performance throughout the app.

Be on the lookout…

There are a few ways your code can fall into this trap. Here’s (a simplification of) one that got me recently:

catch (Exception ex)
{
  Log.Error(ex.Message);
}

It’s a bit harder to spot the logging of an arbitrary string here, but like the broken example above, this will generate a potentially-unique template for every message.

(In case you’re wondering, the recommended way to log an exception with Serilog is:

catch (Exception ex)
{
  Log.Error(ex, "Failed while trying to open the database");
}

The exception will be attached to the event as a first-class property, and fully-rendered in both text and JSON… Easy!)

Event = Template + Properties

Just remember, a Serilog event isn’t a string – it’s the combination of a message template and zero-or-more properties.

Thinking this way will help you get more value out of Serilog as your application grows and you start to consider more advanced storage/querying options. Happy logging!

Seq 1.5 preview

TL;DR: The next point release of Seq is a bit broader in scope than what we’ve shipped in previous point releases. Storage and caching changes in Seq 1.5 bring some noticeable improvements to performance, responsiveness and manageability.

Since Seq first arrived a year ago, its storage has been a simple b-tree based data file managed by ESENT. This had pros as well as cons: on the pro-side, we’ve been able to keep our efforts focused on making Seq quick to set up and easy to use while ESENT does its thing keeping data safe and accessible on disk.

This approach has taken us a long way, but coupled with the particular access patterns generated by log data, it does lead us to some cons, the biggest of which we’re addressing in Seq 1.5.

Query performance and cache efficiency

In Seq 1.4, we left caching decisions to ESENT – what to cache, when to expire it, and how much RAM to use in the process. For the most part this did a fine job, but it left a couple of important opportunities on the table.

Segmented caching

Log data has a very specific access pattern: recent events are always more interesting than historical ones. A couple of queries deep into history shouldn’t cause more recent events to be purged from cache. Recently-written events should get to the cache before needing queries to “warm them up” and so-on.

In 1.5, Seq keeps a time-ordered list of cached event segments. Each segment is a time slice – the preview uses 1-day slices as the unit of granularity. As RAM fills up, segments are dropped from the end of the list, preserving more recent events.

Cache

It’s a very simple strategy, but an effective one – response times on queries improve dramatically when all data is in cache, and most queries of interest are filled by recent events.

Naturally, response time drops as the query progresses into un-cached regions of storage. Our current “usability benchmark” is to consider a million 1 KB events generated per day. On a 16 GB server, around a week’s data will be cacheable, and response time on this will be within a few seconds. Most Seq installations carry a lot fewer events and so will run completely in RAM.

Past the cache bounds, “archival” data is available at a response time a couple of orders of magnitude below this. Optimally, a retention policy will thin out the events past this point, e.g. by deleting “debug” or tracing events, thus bringing response times back up within a similar range.

Optimal in-memory representation

Using the storage layer’s cache meant Seq still had quite a bit of work to do when scanning events to fulfil query results. This work not only added to response time – it also generated garbage that pushed the GC harder, reducing overall performance.

Since we’re now in control of our own caching destiny, in Seq 1.5 we’re free to choose an in-memory representation that’s much closer to our needs when querying. This eliminates all JSON processing and allocation on a per-event basis, and makes it possible to run practically all queries in compiled code.

Schema-sharing

The most exciting outcome of choosing our own in-memory representation is the opportunity we then have to extract common features from events.

When caching raw event data at the storage layer, each “blob” is unique – property names and so-on get stored for every event. If two events are created by the same line of code, thus having the same property names and message template, the storage layer will record two copies of this information.

In RAM, we’re free to share schema information and duplicated values between many events. While the following two events share nothing on disk:

{"Latitude": 25, "Longitude": 134}
{"Latitude": 12, "Longitude": 54}

In the Seq caching layer they’ll share a single schema object:

Schema

Over many events the savings brought by this approach are significant.

Manageability

Using a single, large data file brought with it a couple of major drawbacks.

Sheer size

First, storing everything in a single large file makes it hard to work with large Seq installations. One user ended up with a single 350 GB default.seq that was impossible to work with in any way. We don’t want to see that again!

In Seq 1.5, files on disk are split into 7-day extents:

Storage

The extents are completely self-contained, meaning they can be deleted, moved or backed-up individually.

Compaction time

ESENT data files can only be compacted (shrunk to remove free space) off-line. Coupled with creating a very large data file, this could mean considerable downtime while disk space was reclaimed, and having to use command-line tools to do it was inconvenient.

Compaction would also require enough free disk space to hold a complete copy of the data – a challenge and a waste when a lot of event data is stored.

Using a series of extents on disk means that segments can be compacted one at a time, ideally in the background on a running server, and only the storage for a single extent is required during the compaction process.

The 1.5.5-pre build still relies on a command-line seq compact facility, but storage requirements are lessened and the process is much more responsive. We’ll be working on background compaction for the next preview.

Installing Seq 1.5.5-pre

There are a few things to know before you upgrade to Seq 1.5.5-pre.

The migration process requires approximately the same disk space as is taken by the existing C:\ProgramData\Seq\Data folder. Seq will refuse to migrate or run without this available. After migration is run, the space will be freed and won’t be needed again except to store events.

Running the following commands prior to migration will free as much space as possible for the process:

seq stop
seq compact

Migration from a large Seq 1.4 database can also take considerable time – allowing 10 minutes per million events would give a safe margin; better hardware will move data much faster. If this makes migration prohibitive, and you can’t use retention policies to cut down on event volume beforehand, please email support for help.

After migration completes, it will take a little while for the cache to warm up. If you see an asterisk (*) after the Loading… indicator text, this means events are being scanned in un-cached storage regions. Give it a few more minutes :).

Finally, the new cache is designed for server use and thus consumes as much system memory as it can. If the system comes under memory pressure, it will release memory to compensate. This mightn’t be convenient on all systems – if you’re in this position we’d like to know a bit more about your setup: email us at the address above and we’ll let you know when configuration options are available to control this.

What else is to come from 1.5?

Already included in the current preview are some major improvements to start-up/shut-down time, and some long overdue features such as manual deletion of events. There are several other tickets planned for the full release, and I’ll post some details on those as they land.

How does 1.5.5-pre work for you?

We haven’t published any figures with this post; when Seq 1.5 is ready to ship I’ll make sure we share some 1.4 vs. 1.5 benchmarks, but for the time being we’re looking for real-world usage and feedback to validate the direction we’re taking and provide a qualitative measure of how the changes stack up.

If you’re able to try Seq 1.5, we’d love to hear your impressions. It’s not a production-ready release, but we’ll do our very best to support migration from the preview to RTW when that happens (we’ve migrated our own servers!) so with some small risk it should be fine to use on dev/non-mission-critical systems.

Download it now!

You can grab the installer here.

Seq and Serilog around the web – August

Things have been quiet on my blog for the last few months, but certainly lively elsewhere around the web. I thought I’d post a few pointers to what’s new in the worlds of Seq and Serilog.

Hey— I’ve started cross-posting anything related to Seq on the Seq blog; this (nblumhardt.com) site will continue as my personal blog with a bit more varied material including most Seq content for now, so stay tuned in!

Modern Structured Logging with Serilog and Seq

If you or your company has a Pluralsight subscription, make sure you check out the great Modern Structured Logging with Serilog and Seq course written and presented by Jason Roberts.

The first two major sections of the course give a deep introduction to structured logging and Serilog, while the third section introduces Seq; the course is worth watching for either or both.

If you’re championing structured logging in your organisation, Jason’s course would make a great way to bring others up to speed.

Improve and Repeat

Johnny Graber’s Improve and Repeat blog is featuring Serilog in a series on modern structured logging. The first two instalments are up:

  1. The Missed Opportunities of Log Files
  2. Structured Logging with Serilog

It’s shaping up to be another good resource for teams starting to introduce structured logging.

If your team still needs convincing that learning about Serilog is worth their time, Kristoffer Jansson’s Why You Should Try Out Serilog should help.

Event-processing angles

I thought it was cool to see this sample from the XSockets.NET team showing how to connect to a running (Serilog-enabled) application to collect logs.

Unlike other sinks we’ve seen to date, in this case the sink is a server, and clients connecting via XSockets request the level of log events they wish to be sent. It’s a really exciting angle that I think deserves serious exploration (think on-demand real-time diagnostics…)

Tangentially related (since we’re talking event processing here :)) Daniel Little posted a nice little primer on Debugging Rx with Seq, which is a pretty imaginative use for it.

Site redesigns!

Both Serilog and Seq themselves got new websites in the last couple of months.

Serilog is now an eye-catching red design:

Serilog.net

I’m really happy with the way this one turned out – it’s nice and simple, and keeps all of the important info (how to use Serilog; where to find documentation and so-on) front-and-center.

Seq now has a new layout that makes it easier to organize content:

GetSeq.net

Currently the biggest improvement, other than the livelier design, is better access to older versions and release notes via the improved Downloads area. It’s also nice to have complete setup instructions back on the front page to show just how easy it is to get started.

What have I missed?

Let me know :)

Seq update – monthly subscriptions, updated tiers, full-featured single-user license

TL;DR: Seq 1.4 is out, and with it we’re fine tuning the licensing options.

Building a software product is a continual exercise in listening, learning and adapting. Since releasing v1.0 of Seq nearly five months ago, we’ve watched keenly as customers put it to work and to be frank, we now have a much better understanding of how Seq fits in to the tooling landscape.

Part of this learning has shown us how customers fit Seq into their own environments, including how those environments dictate licensing requirements. Feedback has highlighted some discrepancies between how we license Seq and how it’s used, so for the past month we’ve been working on a revision to our model.

First, two important things:

  • If you’re using Seq today, you don’t need to change anything – the license you use Seq under is unchanged unless you upgrade to a newer version (outside of the one-year included upgrades period)
  • If you’re worse off under the new model, or have questions/suggestions please email support@getseq.net – we will help you across

Changes to our free tier

The challenge with licensing is to make a fair cost/benefit proposal: Seq should be less expensive to use when the benefit is less and when the effort in supporting it is less. When the benefit is greater, or when the support requirements are greater, asking more helps us to provide the level of service required.

Unfortunately, the unlimited-usage “Developer Edition” we launched with thwarts this requirement – how much Seq costs to use ended up coming down to whether the production environment was fully-secured or not. In a production environment secured by other means (e.g. a VPN or other network-level security) the price to use Seq might be zero (via the “Developer Edition”), while a small difference in network topology might require the top-level “Enterprise Edition” for the exact same use of Seq – that is, the exact same benefit.

To fix this, we’ve bitten the bullet and decided to license Seq purely on a per-user basis. The old Developer Edition has been replaced with a new, fully-securable Single-User license. Many, if not most, customers using the old Developer Edition will be able to switch to the new Single-User license and take advantage of authentication and SSL to put Seq in more places.

Register to receive your single-user license key

No model is perfect, but we think setting the price on the basis of the number of users is a much better measure of both how much value Seq will provide, and how much support/development we will need to sustain to make the best experience possible. Basing this on whether or not authentication is required was a mistake on our part.

Some teams will be using the old Developer Edition with more than one user, so won’t be able to take advantage of the new Single-User license. If you’re in this position, and can’t switch to one of our subscriptions or per-user tiers, get in touch – supporting you is important to us and we’ll help you across so that you can continue to benefit from new Seq releases.

New per-user tiers

Previously we supplied Seq in five, fifteen and twenty-five user tiers. There turns out to be a big difference between a six-user team and a fourteen-user one, so the new model is more fine-grained, providing five-user increments from 5 all the way to 25 users.

See the new pricing tiers

Three server installations with all tiers

Perhaps more importantly, Seq previously included a single server installation with each of the five- and fifteen-user tiers. It turns out that this penalised users with segregated environments: if “Dev” and “Test” are on-premises, but “Prod” in the cloud, requiring two licenses seems fairly arbitrary – where an environment is situated shouldn’t make such a big impact on pricing.

In the new model, each tier includes three Seq servers. This means a single license should cover most teams regardless of how their environments are organised.

New unlimited-user Enterprise license

In Enterprise environments, counting seats can be annoying and lead to more time spent in the purchasing process as teams change size and shift focus. For Enterprise customers we’re now offering an unlimited-user license, installable on up to ten servers, to make life simpler.

We’re also looking at ways to make an Enterprise SLA available alongside this option – please get in touch via sales@getseq.net if this is needed in your organization.

Subscription pricing

Along with our up-front pricing, we’re now offering Seq on a pay-as-you go basis for those who’d rather pay by the month. We’re making this available primarily as a convenience – the product you get is exactly the same, i.e. you will still host Seq on your own servers.

Subscriptions come with full email support and access to all upgrades during the subscription period.

Subscribe now!

We’re counting on your feedback

We count on your feedback to make Seq a better product and to make it accessible no matter your needs; if you have suggestions, queries or concerns we’d love it if you can email us.

Using attributes to control destructuring in Serilog

Serilog’s often used to log out objects as structured data. For example:

log.Information("Executing {@Command}...", cmd);

will attach a structured Command property to the event that can later be queried or processed.

Sometimes we don’t want all of the properties of an object included. Where cmd is, say, a LoginCommand:

public class LoginCommand
{
    public string Username { get; set; }

    public string Password { get; set; }

    public bool RememberMe { get; set; }
}

We’d like to log the username attached to the command, but not the password. The same situation arises with credit card numbers and other sensitive information. Without some care we’ll end up logging events like:

{
  "Level": "Information",
  "Timestamp": "2014-07-18T12:30+10:00",
  "RenderedMessage": "Executing LoginCommand {Username: \"nick\", Password: \"Pa$$w0rd\", RememberMe: true}"
  "MessageTemplate": "Executing {@Command}...",
  "Properties":
  {
    "Command":
    {
      "Username": "nick",
      "Password": "Pa$$w0rd",
      "RememberMe": true
    }
  }
}

No one likes to see their password in a log file.

Classically, the way to approach this in Serilog has been to customize the destructuring policy for the command type:

var log = new LoggerConfiguration()
    .WriteTo.ColoredConsole()
    .Destructure.ByTransforming<LoginCommand>(
        cmd => new { cmd.Username, cmd.RememberMe })
    .CreateLogger();

The problem with this approach is pretty obvious — it centralizes destructuring information in a way that might not always co-operate with assembly dependencies and so-on, and gets cluttered as more types need customization.

Why ‘destructuring’ anyway?

Here’s a chance to explain one of the more unusual pieces of terminology in Serilog. The name Serilog is derived from serializing logger, and destructuring pretty much equates with serialization, so why not call it that?

The original design for Serilog was going to tackle the problem we’re observing here using a syntax like this:

log.Information("Executing {@Command(Username,RememberMe)}...", cmd);

This would have the same effect as the configuration we just looked at, selecting just the Username and RememberMe properties from the command object, but doesn’t suffer the same “centralization” issues.

This style of describing “what you want” is actually implemented in some programming languages with a feature called destructuring assignment. Hence, the name! It’s not likely that we’ll add this feature to Serilog now; a more familiar way to achieve the same result is to use attributes, as we’ll see next.

Destructuring with attributes

Fresh off the press, here’s the new solution enabled using the Serilog.Extras.Attributed package:

public class LoginCommand
{
    public string Username { get; set; }

    [NotLogged]
    public string Password { get; set; }

    public bool RememberMe { get; set; }
}

Nice and simple, huh? The NotLogged attribute simply marks properties to ignore.

Enabling this feature, after installing the NuGet package mentioned above, is as simple as calling the UsingAttributes() extension method at configuration time:

var log = new LoggerConfiguration()
    .WriteTo.ColoredConsole()
    .Destructure.UsingAttributes()
    .CreateLogger();

Other goodies in Serilog.Extras.Attributed

There’s one other little thing that you might find handy in this package. Some types, while they’re “complex data” in some sense, don’t serialize well for storage in events. These types need to be treated like scalar values (int, string and so-on) or by conversion with ToString() before logging.

You can force this behavior using the LogAsScalar attribute; here’s an example that appears in Seq:

public class Filter
{
    public string SourceText { get; set; }

    [LogAsScalar]
    public Expression<Func<Event,bool>> FilterExpression { get; set; }
}

Without LogAsScalar, the FilterExpression would be serialized as a (potentially) large tree-shaped object, that while interesting, isn’t valuable in a log. Using the attribute let’s us keep the property in the log, but use its more friendly ToString() representation.

Go get it! :)

Detecting never-before-seen event types

Perhaps the most useful, but seldom-noticed benefit of a structured log is having the first-class notion of an event type.

When analyzing a traditional text-based log, there’s no concrete relationship between the messages:

Pre-discount tax total calculated at $0.14
Pre-discount tax total calculated at $5.20
Customer paid using CreditCard

From a tooling perspective, each might as well be a unique block of arbitrary text. A lot of work’s required using tools like logstash to go further than that.

In a structured log from Serilog, the message template passed to the logging function is preserved along with the event. Since the first two come from the template:

Pre-discount tax total calculated at {TaxAmount}

While the third comes from:

Customer paid using {PaymentMethod}

We can use this information to unambiguously find or exclude either kind of event. It’s so simple and straightforward it is mind-boggling not to see this technique used everywhere!

Working with message templates is verbose though, so Seq produces a 32-bit (Murmur) hash of the message template that can be referred to using hex literals, for example Pre-discount tax total calculated at {TaxAmount} → $A26D9943, while Customer paid using {PaymentMethod} → $4A310040.

Seq calls this the hash the “event type”. Here’s Seq selecting all events of the first type on my test machine:

AllOfType

This was for me, actually, one of the core reasons to justify spending a bunch of time on Serilog and Seq – application logs no longer seemed like a hack or a crutch: working with “real” events that have an actual type elevates them to something that can be usefully manipulated and programmed against.

So, what else can you do with a log imbued with first-class event types?

Event types and “interestingness”

When I log into any of the Seq servers I maintain, they look much like this:

AllIsWell

All seems pretty good – nothing out of the ordinary. Well, at least in the last minute or two…?! It’s really hard looking at a small window on a large event stream to make any proactive decisions, because by definition, the interesting events are a minority.

A compromise I use often is to create a view just for “Warnings and Errors” – unfortunately, on a large stream, this view’s not so useful either: the same kinds of things show up again and again. Diligently excluding uninteresting events by type can help here, but that’s a very manual process and isn’t much fun in practice.

Seq 1.4 introduces some simple local storage for hosted apps, so as a test case for that I’ve spiked a different approach to finding “interesting” events. Using some app-local storage, why not keep track of which event types we’ve seen before, and emit an event whenever something “new” happens?

For example, in the case:

for (var i = 0; i < 10; ++i)
{
    Log.Information("The value is {Counter}", i);
}

We’d detect the event “The value is 0″ and then ignore further events with this event type (for values 1, 2, …). By keeping tabs on new events I might spot something unusual, or, I might just decide to incorporate the new event type into a counter or chart.

There’s a broader notion of “interestingness” than this of course, but for a simple example it’s still quite powerful.

First of Type

Here’s a new view on the same server shown earlier:

ViewOfEvents

Rather than list the events themselves, we see the first occurrence of each event type and a hyperlink to view the event itself (shown in the expanded property view).

This view is generated by a new Seq app “First of Type”:

NuGetOrg

You can see the complete implementation with a few small improvements on GitHub, and install it into the Seq 1.4.4-pre build or later using the ID Seq.App.FirstOfType.

Once you’ve installed the app into your Seq server, “start a new instance of the app” that automatically accepts events, and click through to see the stream generated:

ClickThroughToEvents

(You may need to wait for 30 seconds before the new app starts getting events: Seq uses a short buffer to sort events on arrival so that they are processed in timestamp-order).

So how’s it done?

The reactor implementation:

public void On(Event<LogEventData> evt)
{

    var stateFile = Path.Combine(StoragePath, StateFilename);
    if (_filter == null)
    {
        if (File.Exists(stateFile))
            _filter = new UInt32BloomFilter(File.ReadAllBytes(stateFile));
        else
            _filter = new UInt32BloomFilter();
    }

    if (!_filter.MayContain(evt.EventType))
    {
        Log.Information("First of {DetectedEventType}: {DetectedEventMessage} ({DetectedEventId})",
            "$" + evt.EventType.ToString("X8"),
            evt.Data.RenderedMessage,
            evt.Id);

        _filter.Add(evt.EventType);

        File.WriteAllBytes(stateFile, _filter.Bytes);
    }
}

The new feature that enables this is the StoragePath property, which simply provides a unique filesystem folder for the running app instance:

    var stateFile = Path.Combine(StoragePath, StateFilename);

To keep runtime performance of the app and the size of the state file constant, we use a Bloom filter to track which event types we’ve seen:

            _filter = new UInt32BloomFilter(File.ReadAllBytes(stateFile));

The spike of the app as currently written uses a slightly smaller filter, and fewer hashes than desirable – if anyone’s interested in “doing the math” and setting up a better filter, PRs are accepted for this!

The language of the Bloom filter’s API – “may contain”, stresses that an approximation is being used:

    if (!_filter.MayContain(evt.EventType))

This does mean that new types can be missed. Tuning the filter can bring this down to a negligible percentage.

The app emits an event using Log:

        Log.Information("First of {DetectedEventType}: {DetectedEventMessage} ({DetectedEventId})",
            "$" + evt.EventType.ToString("X8"),
            evt.Data.RenderedMessage,
            evt.Id);

We don’t have to do anything special to get evt.Id hyperlinked in the Seq UI – the UI’s smart enough to detect this.

The rest of the app simply maintains the state file as event types are seen.

What’s Next?

I’ve often written log messages thinking “well, if I ever see this message I’ll know there’s a bug” – using this app there’s actually the possibility that I might notice a needle in the haystack!

It’s not a big leap from here to imagine a similar app that could track property values on an event type, either testing for uniqueness or observing variance from an historical range. Having local storage within Seq apps opens up a lot of possibilities I hadn’t given much thought to in the past. Happy hacking! :)

Filtering events on arrival at Seq

Collecting too much log data makes the “needle in the haystack” all the more difficult to find, but for practical reasons it is often easier to collect as much data as possible anyway and “sort it out later” via Seq retention policies.

These have served us well since the early versions of Seq, and selective retention is a feature that’s been widely appreciated by users. Retention policies do come at a cost however – as events are removed from the stream, space in the storage engine needs to be compacted and this can use significant resources on the server (CPU, RAM and I/O).

In many cases, it is preferable to drop uninteresting events as they arrive. This facility is now available in Seq 1.4.2-pre as “input filters”.

Each Seq API key can now specify a filter expression – just like the event stream supports – that identifies events that will be accepted when logged with the key:

InputFilter

A typical use case may be to drop events below a certain level, but any criterion can be included.

If for any reason it becomes necessary to change the filter during operation, updating it takes immediate effect so that more or fewer events can be recorded.

It’s a simple feature right now that we’re looking for feedback on – if you can grab the new preview from the download page (preview link is at the bottom) and try it out it would be great to hear your thoughts!

Not currently in the preview, but also coming in 1.4 is a MinLevel() query function that will tidy up filtering by level like the example shows.

Durable log shipping, from Serilog to Seq

A few days ago I took an impromptu snapshot of one of the Azure machines I have running Seq. Cloud storage being what it is, an I/O problem during the snapshot process crashed the Seq service, but I didn’t notice until the following day. The machine was running horrifically slowly at this point, so I restarted it and all came good.

When I logged into Seq to check some queries, I expected to see a big gap during the outage. It took me a few seconds to realize why there wasn’t one – all of the events raised during the outage period were there! – and that made me think this feature of the Seq client for Serilog might not be as well-known as I’d assumed.

When you set up Serilog to write events to Seq, you use code typically like:

Log.Logger = new LoggerConfiguration()
  .WriteTo.Seq("http://my-seq")
  .CreateLogger();

This will send events in batches directly using HTTP; network problems and hard app crashes can result in event loss.

Some time ago, another parameter became available:

Log.Logger = new LoggerConfiguration()
  .WriteTo.Seq("http://my-seq", bufferBaseFilename: @"C:\Logs\myapp")
  .CreateLogger();

If a buffer location is specified, the Seq client will write events to a set of rolling files like "myapp-2014-06-27.jsnl" (newline-separated JSON) and then ship them to the server when it’s possible. Almost nothing to configure, and voila! Reliable log collection.

There’s still no substitute for a redundant, transactional data store for truly critical events: machines can be lost, disks fill up and so-on. For general application monitoring and diagnostics however, I’m really pleased with how much “bang for the buck” this feature provides.

Can we change the face of JavaScript logging, too?

If you use Serilog in .NET and write code for the web, chances are you’ve wondered whether similar hybrid “structured/text” loggers exist for the browser and Node.js

serilog-js

If you haven’t tried Serilog, here’s a one-minute primer (using JavaScript syntax) so you can follow the rest of this post. Using Serilog is similar to every logging framework, ever:

var name = "Nick";
var wait = 45;
log("Hi {name}, your flight leaves in {wait} minutes", name, wait);

Look carefully at the string formatting though – particularly the way we give names to each of the parameters. The “message” part isn’t destructively rendered into text, but preserved as an object with property values. It can be formatted to the console etc. as usual:

[inf] Hi Nick, your flight leaves in 45 minutes

But more interestingly, it can be serialized in a format like JSON for processing and querying:

{
  "message": "Hi Nick, your flight leaves in 45 minutes",
  "messageTemplate": "Hi {name}, your flight leaves in {wait} minutes",
  "properties":  {
    "name": "Nick",
    "wait": 45
  }
}

Preserving the structure of log data, and making it trivially easy to enrich that data with more contextual information, makes it possible to do some really clever things with the resulting logs.

Try finding all messages where the user’s name starts with “N”, or where the wait time is greater than one hour in a traditional log file if you’re skeptical here. In some trivial cases a regular expression might handle this cleanly, but it quickly becomes complex and cumbersome when more data is involved. Using Serilog it can be as simple as startswith(name, "N") or wait > 60.

Preserving the message template itself ("Hi, {name}...") also has some interesting and very useful results: the message template becomes a kind of “message type” that can be used to find – or exclude – messages of a specific type. Again very hard, often impossible with traditional loggers.

What’s going on in JavaScript?

In JavaScript as elsewhere, there are already some sophisticated logging frameworks, with varied levels of structured data support. A fairly new project called node-bunyan interests me the most, with its clever DTRACE integration and support for serializing objects instead of messages, but even here the new hybrid-style API and DSL provided by Serilog is missing. With node-bunyan it’s possible to log an object and/or a message, but messages are still formatted out in a lossy manner like the earlier loggers on .NET did.

Whether it grows into a standalone project in its own right, or encourages more established frameworks to take a look at this new style of logging, bringing Serilog’s API to JavaScript seems like a fun and worthwhile thing to try.

serilog.js

Tinkering away over a month of so I’ve sketched out what I think serilog.js should look like.

serilog.js-example

Yes, serilog.js terminal output is type-aware, just like syntax highlighting in an IDE! It’s useful when dealing with structured events to see the difference between a number 5 (magenta for numbers) and a string "5" (cyan for strings). In itself this feature is a nice trimming, but it’s not just a party-trick; below the surface, the event is carrying all of this information.

Getting started – installing from NPM

All the code is currently published to NPM. At the command line:

npm install serilog

Then in your node application:

var serilog = require('serilog');
var terminal = require('serilog/src/serilog-terminal-sink');
var file = require('serilog/src/serilog-file-sink');

Serilog “sinks” are destinations for log events. This example uses the terminal sink, which writes to the system terminal/console. At the moment, the sinks are all zipped up directly in the serilog package, but they need to be split out.

Tour of the API

At present, with very little of the back end ‘sinks’ built, serilog.js is all about the API.

Configuring the pipeline

One of the first things to notice is that the logger is a pipeline. Thinking of log events as structured data leads naturally to an API more like what you’d find in event processing frameworks than the typical logging library.

var log = serilog.configuration()
  .writeTo(terminal())
  .minimumLevel('WARNING')
  .writeTo(file({path: 'log.txt'}))
  .createLogger();

Each element is applied in turn:

  • serilog.configuration() kicks things off by creating a new configuration object
  • .writeTo(terminal()) emits all events to the terminal sink
  • .minimumLevel('WARNING') filters out any that are below the WARNING level (we’ll cover levels below)
  • .writeTo(file({path: 'log.txt'}) emits the events that reach it to a log file, and
  • .createLogger() takes all of these pipeline stages and bundles them up into an efficient logger object

The result is log, which we’ll use below. (As in Serilog for .NET, there’s no “global” logger – multiple loggers can be created and are completely independent, unless you link them together with something like .writeTo(log2)).

You might be wondering – why not Node.js streams or RxJS etc.? Simple really – generic APIs are great for glueing together solutions, but for something that’s used extensively in a codebase, maintaining complete control over dependencies, performance and semantics is a good idea long-term, and that’s the path taken here.

Logging levels

serilog.js has four simple logging levels, listed here in order of importance:

log.trace('This is a lot of detail');
log.information('Something of note happened');
log.warning('Look out, wheels are coming off!');
log.error('Something's broken');

If log.information() looks like a bit of a mouthful, don’t worry: the log object itself is a function that can be called as a synonym for information:

log('This is an information-level event');

The default level for a new logger is INFORMATION; using .minimumLevel('TRACE') at the start of the pipeline will bump this up.

Message templates

The syntax of message templates is carried over directly from Serilog. Event properties are named by including them in braces, and the values are provided as positional arguments:

log('{a} + {b} = {c}', 1, 2, 3);
// -> {properties: {a: 1, b: 2, c: 3}}

I’ve left the message/template/timestamp out of the example output (// -> ...) to keep the focus on the structured part of the event, but these other parts of the event exist too.

If you pass an object to the log, e.g.:

var mail = {to: 'nblumhardt', subject: 'Is the post finished?'};
log('Sending {mail}', mail);
// -> {properties: {mail: '[Object] object'}}

You might be surprised that it’s stored using toString(). We haven’t missed a great opportunity to serialize here – the problem is that it is very easy to serialize too much data by mistake this way.

If you know that what you’re passing into a log method is a nice little chunk of data to work with later, prefix the property name with @ to say “I know what I’m doing – serialize this!”:

var mail = {to: 'nblumhardt', subject: 'Is the post finished?'};
log('Sending {@mail}', mail);
// -> {properties: {mail: {to: 'nblumhardt', subject: 'Is the post finished?'}}}

serilog.js can handle structured storage of objects and arrays nicely, and these can support some really cool processing scenarios – just don’t forget the @ (if you watch the events roll by on the terminal, you’ll notice the difference too – objects are colored green, while strings render in cyan).

Context and enrichment

The strength of structured logs is in correlation; this is one of the great challenges when instrumenting distributed apps, and probably the #1 reason that interest in structured logs has picked up again in recent times.

When searching a large number of events produced across many machines, it’s vital to have properties to filter on so that a single location or thread of interaction can be followed.

In a simple case, you might want to instrument all events with the name of the machine producing them. This can be done when configuring the logger:

var log = serilog.configuration()
  .enrich({machine: env.HOSTNAME})
  ...

Events created by the logger will all carry the specified properties:

log('Hello, {name}', 'Nick');
// -> {properties: {name: 'Nick', machine: 'NBLUMHARDT-RMBP'}}

More often though, it’s desirable to tag events produced within a part of the application, or in a transaction. For this the using() method is provided on the logger itself:

var smtpLog = log.using({subsystem: 'SMTP'});
smtpLog('About to send mail to {name}', 'Nick');
// -> { properties: {subsystem: 'SMTP', name: 'Nick'}}
Filtering

A final part of the API to check out is filtering. While most filtering will happen server-side wherever you collect your logs, it can be useful to apply some filters in the logging pipeline if you have a component producing a lot of noise, or if you need to avoid passing events with sensitive data to logs.

This fits in as you’d expect:

var log = serilog.configuration()
  .filter(function(evt){
    return evt.messageTemplate.raw !== 'Making {noise}' &&
      evt.properties.subsystem !== 'SMTP';
  })
  .writeTo(terminal())
Building serilog.js

If you want to tinker with the project, you can grab the source from GitHub. The build process uses gulp to drive some mocha tests. Just type:

npm install
gulp

…and you’re away!

The browser

serilog.js also runs in the browser (though version support will be patchy and is completely untested!). There are some notes on the GitHub site linked above on how to do this, and a test.html page in the repository that sets some of it up.

The catch

Serilog in .NET is out there in production use today. It’s being extended and refined by a core group of contributors and takes what time I have available to coordinate.

I’ll be using serilog.js in a few small JavaScript projects and extending it for my own needs, and I’ll be building a sink that writes to Seq at some point. But, I don’t have the capacity to really drive this project beyond the prototype that it is right now.

If you’re interested in JavaScript, event processing, application monitoring and/or API design, serilog.js needs you. Kick the tyres, check out the code, send a pull request! It’s a project “up for grabs” in the largest sense. Can you help change the face of JavaScript logging?