flex
provides a mechanism for conditionally activating rules.
Any rule whose pattern is prefixed with `<sc>' will only be
active when the scanner is in the start condition named sc. For
example,
<STRING>[^"]* { /* eat up the string body ... */ ... }
will be active only when the scanner is in the `STRING' start condition, and
<INITIAL,STRING,QUOTE>\. { /* handle an escape ... */ ... }
will be active only when the current start condition is either `INITIAL', `STRING', or `QUOTE'.
Start conditions are declared in the definitions (first) section of the
input using unindented lines beginning with either `%s' or
`%x' followed by a list of names. The former declares
inclusive start conditions, the latter exclusive start
conditions. A start condition is activated using the BEGIN
action. Until the next BEGIN
action is executed, rules with the
given start condition will be active and rules with other start
conditions will be inactive. If the start condition is inclusive, then
rules with no start conditions at all will also be active. If it is
exclusive, then only rules qualified with the start condition will be
active. A set of rules contingent on the same exclusive start condition
describe a scanner which is independent of any of the other rules in the
flex
input. Because of this, exclusive start conditions make it
easy to specify "miniscanners" which scan portions of the input that
are syntactically different from the rest (e.g., comments).
If the distinction between inclusive and exclusive start conditions is still a little vague, here's a simple example illustrating the connection between the two. The set of rules:
%s example %% <example>foo /* do something */
is equivalent to
%x example %% <INITIAL,example>foo /* do something */
The default rule (to ECHO any unmatched character) remains active in start conditions.
`BEGIN(0)' returns to the original state where only the rules with no start conditions are active. This state can also be referred to as the start-condition `INITIAL', so `BEGIN(INITIAL)' is equivalent to `BEGIN(0)'. (The parentheses around the start condition name are not required but are considered good style.)
BEGIN
actions can also be given as indented code at the beginning
of the rules section. For example, the following will cause the scanner
to enter the `SPECIAL' start condition whenever yylex
is
called and the global variable enter_special is true:
int enter_special; %x SPECIAL %% if ( enter_special ) BEGIN(SPECIAL); <SPECIAL>blahblahblah ... more rules follow ...
To illustrate the uses of start conditions, here is a scanner which
provides two different interpretations of a string like `123.456'.
By default this scanner will treat the string as three tokens: the
integer `123', a dot `.', and the integer `456'. But if
the string is preceded earlier in the line by the string
`expect-floats' it will treat it as a single token, the
floating-point number 123.456
:
%{ #include <math.h> %} %s expect %% expect-floats BEGIN(expect); <expect>[0-9]+"."[0-9]+ { printf( "found a float, = %f\n", atof( yytext ) ); } <expect>\n { /* that's the end of the line, so * we need another "expect-number" * before we'll recognize any more * numbers */ BEGIN(INITIAL); } [0-9]+ { printf( "found an integer, = %d\n", atoi( yytext ) ); } "." printf( "found a dot\n" );
Here is a scanner which recognizes (and discards) C comments while maintaining a count of the current input line.
%x comment %% int line_num = 1; "/*" BEGIN(comment); <comment>[^*\n]* /* eat anything that's not a '*' */ <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ <comment>\n ++line_num; <comment>"*"+"/" BEGIN(INITIAL);
Note that start-conditions names are really integer values and can be stored as such. Thus, the above could be extended in the following fashion:
%x comment foo %% int line_num = 1; int comment_caller; "/*" { comment_caller = INITIAL; BEGIN(comment); } ... <foo>"/*" { comment_caller = foo; BEGIN(comment); } <comment>[^*\n]* /* eat anything that's not a '*' */ <comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ <comment>\n ++line_num; <comment>"*"+"/" BEGIN(comment_caller);
One can then implement a "stack" of start conditions using an array of
integers. (It is likely that such stacks will become a full-fledged
flex
feature in the future.) Note, though, that start conditions
do not have their own namespace; `%s' and `%x' declare names
in the same fashion as #define
.