%% "zap me"
(It will copy all other characters in the input to the output since they will be matched by the default rule.)
Here is a program which compresses multiple blanks and tabs down to a single blank, and throws away whitespace found at the end of a line:
%% [ \t]+ putchar( ' ' ); [ \t]+$ /* ignore this token */
If the action contains a `{', then the action spans till the
balancing `}' is found, and the action may cross multiple lines.
flex
knows about C strings and comments and won't be fooled by
braces found within them, but also allows actions to begin with
`%{' and will consider the action to be all the text up to the
next `%}' (regardless of ordinary braces inside the action).
An action consisting solely of a vertical bar (`|') means "same as the action for the next rule." See below for an illustration.
Actions can include arbitrary C code, including return statements to
return a value to whatever routine called yylex
. Each time
yylex
is called it continues processing tokens from where it last
left off until it either reaches the end of the file or executes a
return. Once it reaches an end-of-file, however, then any subsequent
call to yylex
will simply immediately return, unless
yyrestart
is first called (see below).
Actions are not allowed to modify `yytext' or `yyleng'.
There are a number of special directives which can be included within an action:
ECHO
yytext
to the scanner's output.
BEGIN
REJECT
yytext
and yyleng
set up appropriately. It may either be
one which matched as much text as the originally chosen rule but came
later in the flex
input file, or one which matched less text.
For example, the following will both count the words in the input and
call the routine special
whenever `frob' is seen:
int word_count = 0; %% frob special(); REJECT; [^ \t\n]+ ++word_count;Without the
REJECT
, any `frob' in the input would not be
counted as a word, since the scanner normally executes only one action
per token. Multiple REJECT
actions are allowed, each one finding
the next best choice to the currently active rule. For example, when
the following scanner scans the token `abcd', it will write
`abcdabcaba' to the output:
%% a | ab | abc | abcd ECHO; REJECT; .|\n /* eat up any unmatched character */(The first three rules share the fourth's action, since they use the special `|' action.)
REJECT
is a particularly expensive
feature in terms of scanner performance; if it is used in any of the
scanner's actions, it will slow down all of the scanner's matching.
Furthermore, REJECT
cannot be used with the `-f' or
`-F' options (see below).
Note also that unlike the other special actions, REJECT
is a
branch; code immediately following it in the action will not be
executed.
yymore()
yytext
rather than replacing it. For example, given the input
`mega-kludge' the following will write `mega-mega-kludge' to
the output:
%% mega- ECHO; yymore(); kludge ECHO;First `mega-' is matched and echoed to the output. Then `kludge' is matched, but the previous `mega-' is still hanging around at the beginning of yytext so the ECHO for the `kludge' rule will actually write `mega-kludge'. The presence of
yymore
in the scanner's action entails a minor performance penalty in the
scanner's matching speed.
yyless(n)
yytext
and yyleng
are adjusted
appropriately (e.g., yyleng
will now be equal to n). For
example, on the input `foobar' the following will write out
`foobarbar':
%% foobar ECHO; yyless(3); [a-z]+ ECHO;`yyless(0)' will cause the entire current input string to be scanned again. Unless you've changed how the scanner will subsequently process its input (using
BEGIN
, for example), this will result in
an endless loop.
unput(c)
{ int i; unput( ')' ); for ( i = yyleng - 1; i >= 0; --i ) unput( yytext[i] ); unput( '(' ); }Note that since each
unput
puts the given character
back at the beginning of the input stream, pushing back
strings must be done back-to-front.
input()
%% "/*" { register int c; for ( ; ; ) { while ( (c = input()) != '*' && c != EOF ) ; /* eat up text of comment */ if ( c == '*' ) { while ( (c = input()) == '*' ) ; if ( c == '/' ) break; /* found the end */ } if ( c == EOF ) { error( "EOF in comment" ); break; } } }(Note that if the scanner is compiled using C++, then
input
is
instead referred to as yyinput
, in order to avoid a name clash
with the C++ stream named input
.)
yyterminate()
return
statement in an action. It terminates
the scanner and returns a 0 to the scanner's caller, indicating
`all done'. Subsequent calls to the scanner will immediately
return unless preceded by a call to yyrestart
(see below). By
default, yyterminate
is also called when an end-of-file is
encountered. It is a macro and may be redefined.