Next: , Previous: Syntax parsing, Up: Internals



12.4 Tree Nodes

If you want really to understand how the GNU Pascal language front-end works internally and perhaps want to improve the compiler, it is important that you understand GPC's internal data structures.

The data structure used by the language front-end to hold all information about your Pascal program are the so-called “tree nodes”. (Well, it needn't be Pascal source – tree nodes are language independent.) The tree nodes are kind of objects, connected to each other via pointers. Since the GNU compiler is written in C and was created at a time where nobody really thought about object-oriented programming languages yet, a lot of effort has been taken to create these “objects” in C.

Here is an extract from the “object hierarchy”. Omissions are marked with “...”; nodes in parentheses are “abstract”: They are never instantiated and aren't really defined. They only appear here to clarify the structure of the tree node hierarchy. The complete list is in ../tree.def; additional information can be found in ../tree.h.

     (tree_node)
     |
     |--- ERROR_MARK  { enables GPC to continue after an error }
     |
     |--- (identifier)
     |    |
     |    |--- IDENTIFIER_NODE
     |    |
     |    \--- OP_IDENTIFIER
     |
     |--- TREE_LIST  { a list of nodes, also used as a
     |                  general-purpose "container object" }
     |
     |--- TREE_VEC
     |
     |--- BLOCK
     |
     |--- (type)  { information about types }
     |    |
     |    |--- VOID_TYPE
     |    |
     |    |--- INTEGER_TYPE
     |   ...
     |    |
     |    |--- RECORD_TYPE
     |    |
     |    |--- FUNCTION_TYPE
     |    |
     |    \--- LANG_TYPE  { for language-specific extensions }
     |
     |--- INTEGER_CST  { an integer constant }
     |
     |--- REAL_CST
     |
     |--- STRING_CST
     |
     |--- COMPLEX_CST
     |
     |--- (declaration)
     |    |
     |    |--- FUNCTION_DECL
     |   ...
     |    |
     |    |--- TYPE_DECL
     |    |
     |    \--- VAR_DECL
     |
     |--- (reference)
     |    |
     |    |--- COMPONENT_REF
     |   ...
     |    |
     |    \--- ARRAY_REF
     |
     |--- CONSTRUCTOR
     |
     \--- (expression)
          |
          |--- MODIFY_EXPR  { assignment }
          |
          |--- PLUS_EXPR  { addition }
         ...
          |
          |--- CALL_EXPR  { procedure/function call }
          |
          |--- GOTO_EXPR
          |
          \--- LOOP_EXPR  { for all loops }

Most of these tree nodes – struct variables in fact – contain pointers to other tree nodes. A TREE_LIST for instance has a TREE_VALUE and a TREE_PURPOSE slot which can contain arbitrary data; a third pointer TREE_CHAIN points to the next TREE_LIST node and thus allows us to create linked lists of tree nodes.

One example: When GPC reads the list of identifiers in a variable declaration

     var
       Foo, Bar, Baz: Integer;

the parser creates a chain of TREE_LISTs whose TREE_VALUEs hold IDENTIFIER_NODEs for the identifiers Foo, Bar, and Baz. The function declare_variables() (declared in declarations.c) gets this tree list as a parameter, does some magic, and finally passes a chain of VAR_DECL nodes to the back-end.

The VAR_DECL nodes in turn have a pointer TREE_TYPE which holds a _TYPE node – an INTEGER_TYPE node in the example above. Having this, GPC can do type-checking when a variable is referenced.

For another example, let's look at the following statement:

     Baz := Foo + Bar;

Here the parser creates a MODIFY_EXPR tree node. This node has two pointers, TREE_OPERAND[0] which holds a representation of Baz, a VAR_DECL node, and TREE_OPERAND[1] which holds a representation of the sum Foo + Bar. The sum in turn is represented as a PLUS_EXPR tree node whose TREE_OPERAND[0] is the VAR_DECL node Foo, and whose TREE_OPERAND[1] is the VAR_DECL node Bar. Passing this (the MODIFY_EXPR node) to the back-end results in assembler code for the assignment.

If you want to have a closer look at these tree nodes, write a line {$debug-tree FooBar} into your program with FooBar being some identifier in your program. This tells GPC to output the contents of the IDENTIFIER_NODE to the standard error file handle in human-readable form.

While hacking and debugging GPC, you will also wish to have a look at these tree nodes in other cases. Use the debug_tree() function to do so. (In fact {$debug-tree FooBar} does nothing else than to debug_tree() the IDENTIFIER_NODE of the Foobar identifier node – note the capitalization of the first character in the internal representation.)