""> >

Chapter 5
Grammar structure

5.1 TPG grammar structure

TPG grammars may contain three parts:

are defined at the beginning of the grammar (see 5.3).
are described in sections starting with the parser keyword (see 5.5).
Python codes
can appear in sections starting with the main keyword or before the first parser (see 5.4).

See figure 5.1 for a generic TPG grammar.

Figure 5.1: TPG grammar structure
# Options  
set magic = "/usr/bin/env python"  
# Python code  
    class MyClass:  
# Parser Foo  
parser Foo:  
    START -> X Y Z ;  
# More Python code  
    def myfunction:  


Comments in TPG start with # and run until the end of the line.
    # This is a comment

5.3 Options

Some options can be set at the beginning of TPG grammars. The syntax for options is:

set name
sets the boolean name option to true.
set name = ”value
sets the name option to value.
set noname
disables the name option.

5.3.1 Magic option

The magic option tells TPG which interpreter is called when the script is run. The first line of the generated code will start with #! and contains the command line to execute the appropriate interpreter (/usr/bin/env python for example). This has no effect on M$ Windows.

set magic = ”/usr/bin/env python”
adds #!/usr/bin/env python to the first line.
set nomagic
generates no magic line. This is the default behaviour.

5.3.2 CSL options

By default TPG lexers are context free. The CSL option tells TPG to generate a context sensitive lexer (see 8).

set CSL
generates context sensitive lexers.
set noCSL
generates context free lexers. This is the default behaviour.

5.4 Python code

Python code section are not handled by TPG. TPG won’t complain about syntax errors in Python code sections, it is Python’s job. They are copied verbatim to the generated Python parser.

5.4.1 Syntax

Python code is enclosed in double curly brackets. That means that Python code must not contain to consecutive close brackets. You can avoid this by writting } } (with a space) instead of }} (without space).

5.4.2 Indentation

Python code can appear in several parts of a grammar. Since indentation has a special meaning in Python it is important to know how TPG handles spaces and tabulations at the beginning of the lines. In TPG indentation is important only in Python code sections (in main parts, in parser parts and in rules).

When TPG encounters some Python code it removes in all non blank lines the spaces and tabulations that are common to every lines. TPG considers spaces and tabulations as the same character so it is important to always use the same indentation style. Thus it is advised not to mix spaces and tabulations in indentation. Then this code will be reindented when generated according to its location (in a class, in a method or in global space).

The figure 5.2 shows how TPG handles indentation.

Figure 5.2: Code indentation examples

Code in grammars

Generated code



Correct: these lines have four spaces in common. These spaces are removed.


WRONG: it’s a bad idea to start a multiline code section on the first line since the common indentation may be different from what you expect. No error will be raised by TPG but Python won’t compile this code.


Correct: indentation does not matter in a one line Python code.

5.5 TPG parsers

A grammar can contain as many parsers as needed. A parser declaration starts with the parser keyword and contains rules and Python code sections (local to the parser).

5.5.1 Initialisation

The initialisation of Python objects is made by the __init__ method. This method is generated by TPG and cannot be overriden. To resolve this problem an init method (i.e. without the double underscores) is called at initialization time with the arguments given to __init__. See 5.5.3 to add methods to a parser.

5.5.2 Rules

Each rule will be translated into a method of the parser.

5.5.3 Python code

Python code that is local to a parser will be copied in the generated class. This is usually used to add methods or attributes to the parser.