You are hereBlogs / meck's blog / Parser Generator in Python - Quick PLY Introduction

Parser Generator in Python - Quick PLY Introduction


By meck - Posted on 04 January 2009

I have been using regular expressions within my Python projects for a while now, but I've never felt the need to use a full-featured parser generator. This changed yesterday and I had a look at PLY, a pythonic implementation for Lex/Yacc. Being used to parser generation in Java using ANTLR or Cup I came across some remarkable differences which I want to point out for you after the stop. You will also see a small calculator code snippet.

PLY allows you to put everything you need to construct a parser in one single module. PLY will then analyse the module contents to read token and rule definitions. Writing the lexer part is done by simply declaring the list of tokens using a tuple called 'tokens'. Each token has then to be specified by either using a function or variable following the name convention t_TOKEN_NAME. Variables will be sufficient, if you want no special handling of recognised tokens like converting an integer directly into a Python integer object. The lexer part may be finished by specifying further rules including an error handler and then building the constructing the lexer instance as in the following snippet:

tokens = ('NUM', 'LPAREN', 'RPAREN', 'PLUS', 'STAR')
 
def t_NUM(t):
    r'\d+'
    t.value = int(t.value)
    return t
 
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_PLUS   = r'\+'
t_STAR   = r'\*'
 
t_ignore = ' '
 
def t_error(t):
    pass
 
import ply.lex as lex
lexer = lex.lex()

With the lexer part finished, the parser part is done accordingly by defining rules which are functions following the name convention p_RULE_NAME. Each rule uses Python documentation strings to declare the syntax. Within the function body, the tokens or sub rules can be referenced through the rule function parameter. The rest of the parser implementation is similar to the lexer part:

def p_exp(p):
    '''exp : plus_exp
           | term'''
    p[0] = p[1]
 
def p_plus_exp(p):
    '''plus_exp : exp PLUS term'''
    p[0] = p[1] + p[3]
 
def p_term(p):
    '''term : star_term
            | factor'''
    p[0] = p[1]
 
def p_star_term(p):
    '''star_term : term STAR factor'''
    p[0] = p[1] * p[3]
 
def p_factor(p):
    '''factor : paren_exp
              | NUM'''
    p[0] = p[1]
 
def p_paren_exp(p):
    '''paren_exp : LPAREN exp RPAREN'''
    p[0] = p[2]
 
def p_error(p):
	pass
 
import ply.yacc as yacc
parser = yacc.yacc()

The constructed parser object actually recognises the lexer object within the same module and could be used as follows to parse the first program argument:

import sys
print parser.parse(sys.argv[1])

Like this, we have constructed a simple calculator which can evaluate arithmetic expressions with integers, +, * and parentheses. You can see the full python source here. If you want to learn more on PLY, I'd suggest you visit their website at http://www.dabeaz.com/ply/. Finally, I want to summarise the things, I found noteworthy:

PLY's error handling and debugging features are very strong, you can even see a fancy output of the generated parsing table.
The parser construction procedure is a no-brainer since PLY guesses most things and generates the parser at runtime for you.
PLY has little features compared to ANTLR though. You cannot use labels within rule syntax which makes it more difficult to build a syntax structure during the parsing process.

I like the simplicity of PLY, and how the doc-strings are used for specifying the grammar. On the down-side this results in a static grammar, and it sounds to me like it is difficult to modify at run-time (correct me if I'm wrong). Though most people won't need this, a more dynamic solution is pyparsing. It overloads many python operators so that you can specify a grammar using regular python objects. Then you can specify a grammar as follows:

addition = ZeroOrMore(leftside + Literal("=")) + expr + Literal("+") + expr

The addition object can then be modified on the fly.

Dude, that's really cool, I will keep this in my mind for future projects...

With ubooff! Merry Christmas! )))

With ghtfus! Merry Christmas! )))

With larduc! Merry Christmas! )))

With abhuxu! Merry Christmas! )))

With dewine! Merry Christmas! )))