Symbols in Bison grammars represent the grammatical classifications of the language.
A terminal symbol (also known as a token type) represents a
class of syntactically equivalent tokens. You use the symbol in grammar
rules to mean that a token in that class is allowed. The symbol is
represented in the Bison parser by a numeric code, and the yylex
function returns a token type code to indicate what kind of token has been
read. You don't need to know what the code value is; you can use the
symbol to stand for it.
A nonterminal symbol stands for a class of syntactically equivalent groupings. The symbol name is used in writing grammar rules. By convention, it should be all lower case.
Symbol names can contain letters, digits (not at the beginning), underscores and periods. Periods make sense only in nonterminals.
There are two ways of writing terminal symbols in the grammar:
%token
. See section Token Type Names.
'+'
is a character token type. A character token
type doesn't need to be declared unless you need to specify its
semantic value data type (see section Data Types of Semantic Values), associativity, or
precedence (see section Operator Precedence).
By convention, a character token type is used only to represent a
token that consists of that particular character. Thus, the token
type '+'
is used to represent the character `+' as a
token. Nothing enforces this convention, but if you depart from it,
your program will confuse other readers.
All the usual escape sequences used in character literals in C can be
used in Bison as well, but you must not use the null character as a
character literal because its ASCII code, zero, is the code
yylex
returns for end-of-input (see section Calling Convention for yylex
).
How you choose to write a terminal symbol has no effect on its grammatical meaning. That depends only on where it appears in rules and on when the parser function returns that symbol.
The value returned by yylex
is always one of the terminal symbols
(or 0 for end-of-input). Whichever way you write the token type in the
grammar rules, you write it the same way in the definition of yylex
.
The numeric code for a character token type is simply the ASCII code for
the character, so yylex
can use the identical character constant to
generate the requisite code. Each named token type becomes a C macro in
the parser file, so yylex
can use the name to stand for the code.
(This is why periods don't make sense in terminal symbols.)
See section Calling Convention for yylex
.
If yylex
is defined in a separate file, you need to arrange for the
token-type macro definitions to be available there. Use the `-d'
option when you run Bison, so that it will write these macro definitions
into a separate header file `name.tab.h' which you can include
in the other source files that need it. See section Invoking Bison.
The symbol error
is a terminal symbol reserved for error recovery
(see section Error Recovery); you shouldn't use it for any other purpose.
In particular, yylex
should never return this value.