End-user-quality error reporting in Superpower v2

Superpower version 2.0 is nearly finished! The new version adds some polish, makes getting started much easier, and includes a lot more helpers to deal with common text parsing tasks. This release continues to build on Superpower’s core goal of reliably delivering high-quality syntax error messages.

What’s Superpower?

I think it’s fair to say that Sprache democratized text parsing in C#. The straightforward approach and broad range of freely-available examples make it hard to beat Sprache on ease of learning and use. But Sprache’s simplicity leaves some gaps that other libraries have now stepped in to fill. Pidgin, for example, is a performance-oriented alternative that can also handle byte-level parsing (think network protocols and other bulk data processing tasks, where simplicity is second in priority to throughput).

Sprache does a passable job of reporting syntax errors, but its design bakes in some fundamental limits to this. Superpower is an alternative that puts end-user-quality error reporting first: we use this to parse SQL queries over log events in Seq, where the difference between unexpected `m`, expected `o` and unexpected identifier `frm`, expected keyword `from` is the kind of detail we care deeply about.

Recursive descent parsers, like Sprache, Pidgin, and Superpower, aren’t the most efficient class of parsers in big-O terms, but what they lack in theoretical shine, they more than make up for in practicality and ergonomics. Because recursive descent parsers are a straightforward mapping of the grammar onto executable code, they’re easy to debug and test in a modular fashion. Also because they’re direct, executable programs, there’s a whole lot of latitude for programming in features like smarter error reporting.

Putting v2 through its paces

The new TokenizerBuilder in v2 eliminates the need to hand-roll tokenizers for many grammars, except as an optimization - getting the learning curve a little closer to Sprache’s.

Because tokenizers can fail, TokenizerBuilder itself needs to generate good errors. To really test this out, I built a new example for Superpower; it’s a simple interactive REPL that you can feed fragments of JSON, well-formed or broken in whatever tricky ways you can invent to examine the resulting syntax errors from the tokenizer or parser. I’ve been using it as an exploratory way of probing for new cases that aren’t handled well.

Here is the REPL in action:

json> {"thing": [1, 2, null]}
Object:
  thing
    Array:
      Number: 1
      Number: 2
      Null

json> {"thing: [1, 2, null]}
       ^
Syntax error (line 1, column 2): incomplete string, unexpected end of input, expected `"`.

json> {"thing" [1, 2, null]}
               ^
Syntax error (line 1, column 10): unexpected `[`, expected `:`.

json> flase
      ^
Syntax error (line 1, column 1): unexpected identifier `flase`, expected JSON value.

What’s special, of course, is not in these particular cases, but in the fact that informative error messages are presented for just about all invalid inputs, without the parser itself being extensively annotated or re-organized to achieve this.

The new JSON example is written in a literate programming style - the code and comments are a better introduction to Superpower v2 than a blog post can provide. Since a release is just around the corner, now is a great time to clone the repository and run the JsonExample program to try some other cases.

You can also try out v2 by installing 2.0.0-* from NuGet.