Aggregate Queries in Seq Part 1: Goals

To add a bit of variety to the format of this blog, I’ve decided to try diarising a month of programming – November 2015 to be exact, if you’re reading this in the future!

This month I’ve got some steep goals to face: I want to ship a preview of Seq’s next major feature – aggregate queries – by the end of the month. I’m not starting from scratch, but pulling together the current progress into a complete feature is still a lot of work and there are many design decisions yet to make. I don’t intend to post an update every day (I’d have no time for actually writing the code ;-)) but hopefully every few days I can get an installment up here.

So, the first of these diary entries: why am I even working on aggregate queries, and what are they, anyway?

Constraint is a wonderful aid to creation, since without the months-end deadline breathing down my neck I’d no doubt have more to say about this here, but in the interest of making progress, it’s quicker and easier to explain by example: the aggregates we’re talking about are count(), distinct(), sum(), min(), max(), mean(), percentile() and some of their lesser-known friends.

Log data is great for answering ad hoc questions about how and app behaves and is used. A big enhancement to Seq’s analytical capabilities today (which otherwise fall back on exporting tabular data to Excel) would be to ask it questions like:

Which exception types have occurred today, and how many of each type?
Are average transaction processing times improving or degrading?
How many items on average do customers check out?

Aggregate queries enable this, and up all kinds of ways to learn more from the data that’s already collected.

Some of these capabilities overlap with what dedicated metrics can also provide. I am a huge believer in the benefits of measuring and dashboarding anything that moves. Metrics and logs aren’t the same thing though, and the scenarios and usage patterns for each can be startlingly different, from collection right through to storage and processing. Seq can be (and already is) used for very light metrics duties, but in the interest of doing one thing well the immediate goal for aggregation in Seq is to answer ad hoc questions from log data rather than perform heavy-duty timeseries crunching.

Implementing aggregates in Seq means implementing from the ground, up. There’s no SQL database behind the scenes to do the heavy lifting – everything from parsing to planning and executing the queries needs to be done by hand in C#. I’m expecting to learn a lot along the way. It should make for an interesting month, wish me luck! :-)

Read Part 2: Defining a Syntax