Parsing Expression Grammar as a Primitive Recursive-Descent ...
Parsing Expression Grammar as a Primitive Recursive-Descent ...
Parsing Expression Grammar as a Primitive Recursive-Descent ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Parsing</strong> <strong>Expression</strong> <strong>Grammar</strong> <strong>as</strong> a<strong>Primitive</strong> <strong>Recursive</strong>-<strong>Descent</strong> Parserwith BacktrackingAn article by Roman RedziejowskiPresented by:Jørgen Ulrik B. Krag 1
PEG sequence●●Syntax●A B C DSemantics●●●Apply each rule in order, consuming input for eachReturn success if all rules succeedElse reset input position and return failure8
PEG &Expr●●Syntax●&ExprSemantics●●Match against Expr, but do not consume any inputIf match succeeds return success, otherwise returnfailure9
PEG !Expr●●Syntax●!ExprSemantics●●Match against Expr, but do not consume any inputIf match succeeds return failure, otherwise returnsuccess10
PEG repetition●Syntax● Expr+ , Expr *●Semantics●●●Consume input <strong>as</strong> long <strong>as</strong> Expr matches+ returns failure if less than one match w<strong>as</strong> made* always returns success11
PEG zero-or-one●●Syntax●Expr?Semantics●●If Expr matches, consume input and return trueElse return true12
PEG literal matching●Syntax● [s], [c1-c2], 'literal string', .●Semantics●●●●[abcd], match a, b, c or d and consume or returnfailure[0-3], match 0, 1, 2, 3 and consume or return failure'literal string', match the string and consume orreturn failure., match any single character or return failure (at theend of input)13
PEG example●Value = [0-9]+ / '(' Expr ')'● Product = Value ( ( '*' / '/' ) Value )*● Sum = Product ( ( '+' / '-' ) Product )*●Expr = Sum14
PEG pitfalls●●●No left recursingHidden prefix capture●( '+' / '++' ) [a-z] does not match “++n”Spacing●●●No lexer to remove ignored inputSpacing rule must be applied in the grammar wherethere can be whitespacesE<strong>as</strong>y way to do it: Spacing before the first rule andspacing after every “token”15
PEG example with spacing●Value = [0-9]+ S / '(' S Expr ')' S● Product = Value ( ( '*' S / '/' S ) Value )*● Sum = Product ( ( '+' S / '-' S ) Product )*●●Expr = S SumS = [ \n\t]*16
Testing PEG/Packrat parsing●●Is packrat parsing neccessary?●●Most programming languages are mainly LL(1)Exponential in length of statement, times number ofstatementsExperiment●●●Write Java 1.5 PEG parserApply to 10522 source filesExperiment with saving the l<strong>as</strong>t result of eachprocedure17
Test results●●●●Uses about 20 calls per byte input, regardlessof input size16.1% of calls were repeated calls: Calls to thesame rule at the same inputSaving and reusing the l<strong>as</strong>t call of eachprocedure reduces repeated calls to 3.3%Storing the two l<strong>as</strong>t calls yields 1.1% repeatedcalls18
Optimizations●Identifiers used the following rule:●●Identifier = !Keyword Letter LetterOrDigit*Had to test 53 keywords before checking foridentifier● Using a h<strong>as</strong>htable instead gave 10.3%, 1.6%and 0.6% repeated calls while rememberingresult of the l<strong>as</strong>t 0, 1 and 2 calls respectively19
My opinion●●●●●Good overview and introduction to PEGIdentifies some problems with using PEGs forspecifying languagesGives some idea of the effectiveness of a PEGparser for javaDoes not go into details about some of the truebenefits of PEGsAsks a lot of questions in the conclusion20
Other <strong>as</strong>pects of PEGs●●●●UnambiguousTwo PEGs can be combined to form a newPEGMany packrat parsing tools allow you to specifysemantics along with the syntax●Makes it possible to make language extensions likein fortressError recovery can be very hard21
PEG implementations●●●Libraries and parser generators for a lot oflanguages: C, C#, Java, Python, Ruby,Jav<strong>as</strong>cript, Lisp...Rats!●The parser generator used in FortressPerl6: native PEG functionality <strong>as</strong> an extensionto RegExps22
Features of other PEGimplementations●Warning about hidden prefix capture●Optimized literal choice matching●Full packrat behavior23
Examples24