Advances in Parsing

January 21, 2019

Michael D. Adams


Advances in Parsing

Time:   10:45am
Location:   Meeting room 302 (Mountain View), level 3

Parsing is sometimes thought of as a solved problem. However, recent advances show that there is still much to be discovered in this area. This talk reviews two such advances: indentation-sensitive parsing and disambiguation via tree automata.

Several popular languages, such as Haskell, Python, and F#, use the indentation and layout of code as part of their syntax. Parsers for these languages currently use ad hoc techniques to handle layout. This talk presents a simple extension to context-free grammars that can express these layout rules, and derives GLR and LR(k) algorithms for parsing these grammars.

Dealing with precedence and associativity rules in context free grammars (CFGs) has long posed a challenge. While they can be encoded in the structure of the CFG, the result can be difficult to work with and often obfuscates the language designer’s intent. This talk shows how tree automata can specify all of these while still allowing the CFG to be written in a natural manner. This unified theory subsumes and generalizes these other techniques and provides an elegant means of specifying grammatical restrictions.

These advances indicate that there is still much to be discovered about parsing. We have only uncovered the tip of the iceberg