It turns out the parser wasn’t a huge departure from Seq’s existing filter parser. Seq already uses Sprache to parse filter expressions, and Sprache parsers compose very nicely.

After making the current FilterExpressionParser “root” expression public, and defining some new AST nodes like Projection and so-on, things just get bolted together:

static readonly Parser ExpressionValue =
FilterExpressionParser.Expr.Token().Select(e => new ExpressionValue(e));

static readonly Parser Projection =
    from v in AggregateValue.Or(ExpressionValue)
    from l in Label.Optional()
    select new Projection(v, l.GetOrDefault());

Here you can see the way a projection like the count(*) as Total column is constructed from a parser for values, and a parser for optional ‘as’ labels. I had to define a separate parser for some aggregations, like count(*) that aren’t otherwise valid Seq filter syntax, but any existing expression that FilterExpressionParser supports can be used as the value of a projected column.

Heading further up towards the root of the grammar, we get something like:

static readonly Parser Query =
    from @select in Select
    from @where in Where.XOptional()
    from groupBy in GroupBy.XOptional()
    select new Query(@select, @where.GetOrDefault(), groupBy.GetOrDefault());

The resulting Query parser can take some text input and give back a tree of objects representing the parts of the query. Success!

There was one subtle problem here that you can spot by way of the oddly-named XOptional combinator. Sprache on Github provides Optional, which works as advertised, but upon failing a match will backtrack and return success regardless of whether a partial parse was possible or not.

This leads to error messages without a lot of information, for example:

select distinct(ExceptionType) group ApplicationName

is missing the ‘by’ required next to ‘group’. Using Optional the parser reports:

Syntax error (col 31): unexpected 'g'.

Hmmm. Not so good – there’s nothing at all wrong with that ‘g’! The problem is that upon failing to parse the ‘by’, Sprache’s Optional returned a zero-length successful parse, so parsing picks back up at that position and fails because there are no more tokens to match.

The ‘X’ in XOptional is for eXclusive, meaning that the token is optional, but, only if it parses no input whatsoever. As soon as ‘group’ is parsed, the optional branch is considred “taken”, and failures will propagate up. (Sprache ships ‘X’ versions of several parsers already, such as an exclusive Many called XMany.)

Here it is:

static class ParserExtensions
{
    class Some<T> : IOption<T>
    {
        readonly T _value;

        public Some(T value)
        {
            _value = value;
        }

        public T GetOrDefault() => _value;
        public T Get() => _value;
        public bool IsEmpty => false;
        public bool IsDefined => true;
    }

    class None<T> : IOption<T>
    {
        public T GetOrDefault() => default(T);
        public T Get() => 
                throw new InvalidOperationException("The option has no value.");
        public bool IsEmpty => false;
        public bool IsDefined => true;
    }

    public static Parser<IOption<T>> XOptional<T>(this Parser<T> parser)
    {
        if (parser == null) throw new ArgumentNullException(nameof(parser));
        return i =>
        {
            var result = parser(i);
            if (result.WasSuccessful)
                return Result.Success(new Some<T>(result.Value), result.Remainder);

            if (result.Remainder.Equals(i))
                return Result.Success(new None<T>(), i);

            return Result.Failure<IOption<T>>(
                result.Remainder, result.Message, result.Expectations);
        };
    }
}

The divergence from the built-in optional is only succeeding with a zero-length parse if (result.Remainder.Equals(i)).

Using XOptional:

Syntax error (col 38): unexpected 'A', expected keyword 'by'.

Better!

If you haven’t used parser combinators before this whole thing might be a bit surprising – where’s the EBNF? The esoteric command line tools with animal names? It turns out that combinators make parsing into a regular (somewhat imperative) programming task without a lot of mystery surrounding it.

There are some limitations in Sprache’s implementation I’d like to address someday – for example, the error reported on ‘A’ above rather than ‘ApplicationName’ is the result of parsing the raw character stream instead of a tokenised one – but these are minor inconveniences that can be worked around if need be.

If you haven’t looked into combinator-based parsing, there are some great tutorials and examples linked from Sprache’s README. It’s a technique worth adding to your tool belt regardless of the kind of programming you usually do. Little languages are everywhere, waiting to be cracked open!

The most enjoyable and challenging part of any language processing task for me is not so much the parsing though, but taking a tree of syntactic nodes like we have here, and turning it into something executable. That’s coming up next :-)

Read Part 4: Planning