Next: , Previous: Parameter passing, Up: Internals

12.6 GPI files – GNU Pascal Interfaces

This section documents the mechanism how GPC transfers information from the exporting modules and units to the program, module or unit which imports (uses) the information.

A GPI file contains a precompiled GNU Pascal interface. “Precompiled” means in this context that the interface already has been parsed (i.e. the front-end has done its work), but that no assembler output has been produced yet.

The GPI file format is an implementation-dependent (but not too implementation-dependent ;−) file format for storing GNU Pascal interfaces to be exported – Extended Pascal and PXSC module interfaces as well as interface parts of UCSD/Borland Pascal units compiled with GNU Pascal.

To see what information is stored in or loaded from a GPI file, run GPC with an additional command-line option --debug-gpi. Then, GPC will write a human-readable version of what is being stored/loaded to the standard error file handle. (See also: Tree nodes.) Please note: This will usually produce huge amounts of output!

While parsing an interface, GPC stores the names of exported objects in tree lists – look for handle_autoexport in the GPC source files. At the end of the interface, everything is stored in one or more GPI files. This is done in module.c. There you can find the source of create_gpi_files() which documents the file format:

First, a header of 33 bytes containing the string GNU Pascal unit/module interface plus a newline.

This is followed by an integer containing the “magic” value 12345678 (hexadecimal) to carry information about the endianness. Note that, though a single GPI file is always specific to a particular target architecture, the host architecture (i.e., the system on which GPC runs) can be different (cross-compilers). Currently, GPC is not able to convert endianness in GPI files “on the fly”, but at least it will detect and reject GPI files with the “wrong” endianness. When writing GPI files, always the host's endianness is used (this seems to be a good idea even when converting on the fly will be supported in the future, since most often, GPI files created by a cross-compiler will be read again by the same cross-compiler). “Integer” here and in the following paragraphs means a gpi_int (which is currently defined as HOST_WIDE_INT).

The rest of the GPI file consists of chunks. Each chunk starts with a one-byte code that describes the type of the chunk. It is followed by an integer that specifies the size of the chunk (excluding this chunk header). The further contents depend on the type, as listed below.

For the numeric values of the chunk type codes, please refer to GPI_CHUNKS in module.c. Chunk types denoted with (*) must occur exactly once in a GPI file. Other types may occur any number of times (including zero times). The order of chunks is arbitrary. “String” here simply means a character sequence whose length is the chunk's length (so no terminator is needed).

The version of the GPI file which is the same as the GPC version. If USE_GPI_DEBUG_KEY is used (which will insert a “magic” value at the beginning of each node in the node table, see below, so errors in GPI files will be detected more reliably), D is appended to this version string. (Currently, USE_GPI_DEBUG_KEY is used by default.) Furthermore, the GCC backend version is appended, since it also influences GPI files.
The target system the GPI file was compiled for.
The name of the unit/module.
The name of the primary source file of the unit/module.
The name of an interface imported by the current interface. This chunk consists of a string followed by the checksum of the imported interface's nodes, so the chunk length is the length of the string plus the size of an integer. Again, no terminator of the string is needed.

The checksum is currently a simple function of the contents of the GPI_CHUNK_NODES chunk's contents (see below). This might be replaced in the future by a MD5 hash or something else more elaborate.

The name of a file to link.
The name of a library to link (prefixed with -l).
The name of a module initializer. For technical reasons, any such chunk must come after the GPI_CHUNK_MODULE_NAME chunk.
A gpc-main option given in this interface. (More than one occurrence is pointless.)
The exported names and the objects (i.e., constants, data types, variables and routines) they refer to are internally represented as so-called tree nodes as defined in the files ../tree.h and ../tree.def from the GNU compiler back-end. (See also: Tree nodes.)

The main problem when storing tree nodes is that they form a complicated structure in memory with a lot of circular references (actually, not a tree, but a directed graph in the usual terminology, so the name “tree nodes” is actually a misnomer), so the storing mechanism must make sure that nothing is stored multiple times.

The functions load_node() and store_node_fields() do the main work of loading/storing the contents of a tree node with references to all its contained pointers in a GPI file. Each tree node has a TREE_CODE indicating what kind of information it contains. Each kind of tree nodes must be stored in a different way which is not described here. See the source of these functions for details.

As most tree nodes contain pointers to other tree nodes, load_node() is an (indirectly) recursive function. Since this recursion can be circular (think of a record containing a pointer to a record of the same type), we must resolve references to tree nodes which already have been loaded. For this reason, all tree nodes being loaded are kept in a table (rb.nodes). They are entered there before all their fields have been loaded (because loading them is what causes the recursion). So the table contains some incomplete nodes during loading, but at the end of loading a GPI file, they have all been completed.

On the other hand, for store_node_fields() the (seeming) recursion must be resolved to an iterative process so that the single tree nodes are stored one after another in the file, and not mixed together. This is the job of store_tree(). It uses a hash table (see get_node_id()) for efficiency.

When re-exporting (directly or indirectly) a node that was imported from another interface, and a later compiler run imports both interfaces, it must merge the corresponding nodes loaded from both interfaces. Otherwise it would get only similar, but not identical items. However, we cannot simply omit the re-exported nodes from the new interface in case a later compiler run imports only one of them. The same problem occurs when a module exports several interfaces. In this case, a program that imports more than one of them must recognize their contents as identical where they overlap.

Therefore, each node in a GPI file is prefixed (immediately before its tree code) with information about the interface it was originally imported from or stored in first. This information is represented as a reference to an INTERFACE_NAME_NODE followed by the id (as an integer) of the node in that interface. If the node is imported again and re-re-exported, this information is copied unchanged, so it will always refer to the interface the node was originally contained it. For nodes that appear in an interface for the first time (the normal case), a single 0 integer is stored instead of interface INTERFACE_NAME_NODE and id (for shortness, since this information is implicit).

This mechanism is not applied to INTERFACE_NAME_NODEs since there would be a problem when the identifier they represent is the name of the interface they come from; neither to IDENTIFIER_NODEs because they are handled somewhat specially by the backend (e.g., they contain fields like IDENTIFIER_VALUE which depend on the currently active declarations, so storing and loading them in GPI files would be wrong) because there is only one IDENTIFIER_NODE ever made for any particular name. But for the same reason, it is no problem that the mechanism can't be applied to them.

INTERFACE_NAME_NODEs are a special kind of tree nodes, only used for this purpose. They contain the name of the interface, the name of the module (to detect the unlikely case that different modules have interfaces of the same name which otherwise might confuse GPC), and the checksum of that interface. The latter may seem redundant with the checksum stored in the GPI_CHUNK_IMPORT chunk, but in fact it is not. On the one hand, GPI_CHUNK_IMPORT chunks occur only for interfaces imported directly, while the INTERFACE_NAME_NODE mechanism might also refer to interfaces imported indirectly. On the other hand, storing the checksum in the GPI_CHUNK_IMPORT chunks allows the import mechanism to detect discrepancies and refuse to load inconsistent interfaces, whereas during the handling of the GPI_CHUNK_NODES chunk, the imported modules must already have been loaded. (It would be possible to scan the GPI_CHUNK_NODES chunk while deciding whether to recompile, but that would be a lot of extra effort, compared to storing the checksum in the GPI_CHUNK_IMPORT chunks.)

Finally, at the end of the GPI_CHUNK_NODES chunk, a checksum of its own contents (excluding the checksum itself, of course) is appended. This is to detect corrupted GPI files and is independent of the other uses of checksums.

An offset table for the tree nodes. Each node in a GPI file is assigned a unique id (which is stored as an integer wherever nodes refer to other nodes). There are some special tree nodes (e.g., integer_type_node or NULL_TREE) which are used very often and have fixed meanings. They have been assigned predefined ids, so they don't have to be stored in the GPI file at all. Their number and values are fixed (but may change between different GPC versions), see SPECIAL_NODES in module.c.

For the remaining nodes, the GPI_CHUNK_OFFSETS table contains the file offsets as integers where they are stored within the (only) GPI_CHUNK_NODES chunk. The offsets are relative to the start of that chunk, i.e. after the chunk header. After the table (but still in this chunk) the id of the main node which contains the list of all exported names is stored as an integer. (Currently, this is always the last node, but for the file format definition, this is not guaranteed.)

This chunk contains no data (i.e., its size must be 0). Its only purpose is to signal that the module implementation or the implementation part of the unit has been compiled. (Stored, but not used currently.)

That's it. Now you should be able to “read” GPI files using GPC's --debug-gpi option. There is also a utility gpidump (built and installed with GPC, source code in the utils directory) to decode and show the contents of GPI files. It does also some amount of integrity checking (a little more than GPC does while loading GPI files), so if you suspect a problem with GPI files, you might want to run gpidump on them, discarding its standard output (it writes all error reports to standard error, of course).

If you encounter a case where the loaded information differs too much from the stored information, you have found a bug – congratulations! What “too much” means, depends on the object being stored in or loaded from the GPI file. Remember that the order things are loaded from a GPI file is the reversed order things are stored when considering different recursion levels, but the same order when considering the same recursion level. (This is important when using --debug-gpi; with gpidump you can read the file in any order you like.)