Using <CODE>flex</CODE>

Go to the first, previous, next, last section, table of contents.

Patterns in the Input

The patterns in the input are written using an extended set of regular expressions. These are:

x: match the character `x'
.: any character except newline
[xyz]: a "character class"; in this case, the pattern matches either an `x', a `y', or a `z'
[abj-oZ]: a "character class" with a range in it; matches an `a', a `b', any letter from `j' through `o', or a `Z'
[^A-Z]: a "negated character class", i.e., any character but those in the class. In this case, any character except an uppercase letter.
[^A-Z\n]: any character except an uppercase letter or a newline
r*: zero or more r's, where r is any regular expression
r+: one or more r's
r? zero or one r's (that is, "an optional r")
r{2,5}: anywhere from two to five r's
r{2,}: two or more r's
r{4}: exactly 4 r's
{name}: the expansion of the name definition (see above)
"[xyz]\"foo": the literal string: `[xyz]"foo'
\X: if X is an `a', `b', `f', `n', `r', `t', or `v', then the ANSI C interpretation of `\X'. Otherwise, a literal `X' (used to escape operators such as `*')
\123: the character with octal value 123
\x2a: the character with hexadecimal value 2a
(r): match an r; parentheses are used to override precedence (see below)
rs: the regular expression r followed by the regular expression s; called "concatenation"
r|s: either an r or an s
r/s: an r but only if it is followed by an s. The s is not part of the matched text. This type of pattern is called trailing context.
^r: an r, but only at the beginning of a line
r$: an r, but only at the end of a line. Equivalent to `r/\n'.
<s>r: an r, but only in start condition s (see below for discussion of start conditions)
<s1,s2,s3>r: same, but in any of start conditions s1, s2, or s3
<<EOF>>: an end-of-file
<s1,s2><<EOF>>: an end-of-file when in start condition s1 or s2

The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom. Those grouped together have equal precedence. For example,

foo|bar*

is the same as

(foo)|(ba(r*))

since the `*' operator has higher precedence than concatenation, and concatenation higher than alternation (`|'). This pattern therefore matches either the string `foo' or the string `ba' followed by zero or more instances of `r'. To match `foo' or zero or more instances of `bar', use:

foo|(bar)*

and to match zero or more instances of either `foo' or `bar':

(foo|bar)*

Some notes on patterns:

A negated character class such as the example `[^A-Z]' above will match a newline unless `\n' (or an equivalent escape sequence) is one of the characters explicitly present in the negated character class (e.g., `[^A-Z\n]'). This is unlike how many other regular expression tools treat negated character classes, but unfortunately the inconsistency is historically entrenched. Matching newlines means that a pattern like `[^"]*' can match an entire input (overflowing the scanner's input buffer) unless there's another quote in the input.
A rule can have at most one instance of trailing context (the `/' operator or the `$' operator). The start condition, `^', and `<<EOF>>' patterns can only occur at the beginning of a pattern, and, as well as with `/' and `$', cannot be grouped inside parentheses. A `^' which does not occur at the beginning of a rule or a `$' which does not occur at the end of a rule loses its special properties and is treated as a normal character.

The following are illegal:

foo/bar$
<sc1>foo<sc2>bar

You can write the first of these instead as `foo/bar\n'.

In the following examples, `$' and `^' are treated as normal characters:

foo|(bar$)
foo|^bar

If what you want to specify is "either `foo', or `bar' followed by a newline" you can use the following (the special `|' action is explained below):

foo      |
bar$     /* action goes here */

A similar trick will work for matching "either `foo', or `bar' at the beginning of a line."

Go to the first, previous, next, last section, table of contents.