Give your instrumentation some love in 2015!

This is the year for better instrumentation.

On premises, applications are more integrated, distributed and asynchronous than ever
The move to the cloud is driving architectures with fine-grained modularity and scale-out across many remote machines

If you still treat logging as an afterthought, operating and debugging apps in 2015 is looking like a nightmare.

Thankfully, tools and practices are catching up. And, just as it was with requirements, builds, testing and deployment, many improvements are coming from an integrated approach to what was once “somebody else’s problem” or a siloed responsibility.

If your instrumentation isn’t all it could be, here are three suggestions to get you off to a new start this year.

1. Take another look at Structured Logs

First, a new generation of tooling has come online. Text logs from distributed apps are an ergonomic disaster, requiring complex, cumbersome tools to manipulate in all but the most trivial scenarios, so alternatives have been baking away for a long time and are finally hitting the mainstream.

In .NET, we’ve got SLAB from Microsoft Patterns and Practices, and more recently the open-source Serilog, that make structured logging as quick, easy and painless to implement as text logging (think log4net, NLog) but with miraculous advantages when it comes time to consume events for diagnostic purposes.

Unlike text logs, when you write a structured log all of the data associated with an event are kept in a structured format (think “JSON document”) that can be rendered out to a more traditional text message, but also sorted, searched and filtered just like documents in a NoSQL database might be.

What’s the difference between this (log4net):

log.Warn("Cache reached {0} items", cache.Count);

And this? (Serilog):

log.Warning ("Cache reached {CacheSize} items", cache.Count);

To write, very little – and when viewed at the console, not much either. But with appropriate storage, the structured version supports filtering with queries like:

CacheSize > 1000

… and not a regex in sight.

With structured logs you can use clean data types and familiar operators to quickly answer questions that would be time-consuming and awkward using plain text.

If you’re not using structured logs already, give this technique a try this year – it’ll change the way you look at logs for the better.

2. Use Logs to validate new Behavior

Automated tests are great – they verify that a chunk of code in isolation does what you expect. If you’re like me though, you spend a good amount of time smoke testing your work by hand, too – putting features through their paces the old fashioned way and trying to break them.

While you do this, logs are a great way to peer into the internals of an application. Sometimes, below the surface, all is not what it seems – that warning you added: "This code should only be called on even numbered years"? Well, how do you know it isn’t triggering right now? Watching each step in a process logged out clearly is a great way to understand your system better, and it really does improve the quality of the code you check in.

Think about it this way – unit tests validate your expectations, but logs will show you things you didn’t expect.

For this strategy to be effective, you need to have logs written to a log viewer on your development machine. Console terminals and text files are under-powered and quickly become a blur when things heat up. There are lots of great tools for this; our log server Seq is free for development use, or check some of the other options you can find with a quick web search. What tooling you use is a secondary consideration though – the important thing is to make feedback from logs a first-class part of your basic development process.

And, here’s the trick: while you’re using logs in this way, you’re implicitly shaping them to be useful for diagnosing issues in production. If your logs show you clearly what’s happening during development, there’s a good chance they’ll be useful later down the track too.

3. Match Issues with relevant Logs

Here’s another great strategy I’ve seen used to brilliant effect.

When features are formally tested, pipe logs from the test environment to a log server. Whenever someone raises an issue, ask them to include a link to relevant events from the log. (This will quickly validate the quality of your error messages!)

For example, if the user is raising purchase order #12345, they can search the log from the test environment for “purchase order 12345” and should get relevant results that they can link to from an issue.

Again, here’s the trick: how will the tester always find relevant events, you ask? Well – that’s a good question! When a real customer hits an issue raising a PO, you’ll need to do the same thing. This makes test-time issues a drill for the real thing – one that validates the quality of your instrumentation and improves your readiness for dealing with issues in production.

As a bonus, you’ll soon be able to link to logs for issues in the production environment, too, which is absolutely golden if you’re the developer in the hot seat.

Conclusion

That’s it for this post. Three things you can try to get your instrumentation kicked off in the right direction this year:

Write structured logs
Use them to validate behavior during development
Link relevant logs from your test team’s issue reports

Once we take instrumentation back into the realm of our responsibility as developers and stop just throwing it over the wall, there are lots more interesting things to consider about writing smarter, more effective logs in the first place. I’d love to hear what works for you.