Syntax Error Recovery in Parsing Expression Grammars
Parsing Expression Grammars (PEGs) are a formalism used to describe top-down parsers with backtracking. As PEGs do not provide a good error recovery mechanism, PEG-based parsers usually do not recover from syntax errors in the input, or recover from syntax errors using ad-hoc, implementation-specific features. The lack of proper error recovery makes PEG parsers unsuitable for using with Integrated Development Environments (IDEs), which need to build syntactic trees even for incomplete, syntactically invalid programs. We propose a conservative extension, based on PEGs with labeled failures, that adds a syntax error recovery mechanism for PEGs. This extension associates recovery expressions to labels, where a label now not only reports a syntax error but also uses this recovery expression to reach a synchronization point in the input and resume parsing. We give an operational semantics of PEGs with this recovery mechanism, and use an implementation based on such semantics to build a robust parser for the Lua language. We evaluate the effectiveness of this parser, alone and in comparison with a Lua parser with automatic error recovery generated by ANTLR, a popular parser generator.
READ FULL TEXT