12.2.1 Lexer problems
Pascal is a language that's easy to lex and parse. Then came Borland
...
A number of their ad-hoc syntax extensions cause lexing or parsing
problems, and even ambiguities. This lexer tries to solve them as
well as possible, sometimes with clever rules, other times with
gross hacks and with help from the parser. (And, BTW, it handles
regular Pascal as well. ;-)
Some of the problems are (see also see Parsing conflicts):
- Real constants with a trailing .. Problem: They make the
character sequence 2.) ambiguous. It could be interpreted as
2.0, followed by ) or as 2 and .) (which
is an alternative for ]). This lexer chooses the latter
interpretation, like BP does, and the standard requires. It would be
possible to handle both, by keeping a stack of the currently open
parentheses and brackets and chosing the matching closing one, but
since BP does not do this, either, it doesn't seem worth the
trouble. (Or maybe later ... ;-)
- They also cause a little problem in the sequence 2.. (the
start of an integer subrange), but this is easily solved by normal
lexer look-ahead since a real constant can't be followed by a
. in any Pascal dialect we know of.
- Missing token separators between integer or real constants and a
following keyword (e.g. 42to). It gets worse with hex numbers
($abcduntil), but it's not really difficult to lex. However,
we don't allow this with Extended Pascal non-decimal integer
constants, e.g. 16#abcduntil where it would be a little more
difficult (because it would depend on the base whether or not
u is a digit). Since BP does not even support EP non-decimal
constants, there's no point in going to such troubles.
- Character constants with #. They conflict with the Extended
Pascal non-decimal integer number notation. #13#10 could mean
Chr (13) + Chr (10) or Chr (13#10). This lexer chooses
the former interpretation, since the latter one would be a mix of BP
and Extended Pascal features.
- Last (but not least – no, certainly worst): Character constants
with ^ (was this “feature” meant as an AFJ or
something???). GPC tries to make the best out of a stupid situation,
see the next section (see BP character constants) for details.
It should be noted that BP itself fails in a number of situations
involving such character constants, probably the clearest sign for a
design bug.
- But also GPC's extension ... for variadic external function
declarations causes a problem in the sequence (...) which
could mean (, ..., ), i.e., a parameter list
with only variadic arguments, or (., ., .).
Since the latter token sequence is meaningless in any Pascal dialect
we know of, this lexer chooses the former one which is easily
accomplished with normal look-ahead.