Go to the first, previous, next, last section, table of contents.

Patterns in the Input

The patterns in the input are written using an extended set of regular expressions. These are:

x
match the character `x'
.
any character except newline
[xyz]
a "character class"; in this case, the pattern matches either an `x', a `y', or a `z'
[abj-oZ]
a "character class" with a range in it; matches an `a', a `b', any letter from `j' through `o', or a `Z'
[^A-Z]
a "negated character class", i.e., any character but those in the class. In this case, any character except an uppercase letter.
[^A-Z\n]
any character except an uppercase letter or a newline
r*
zero or more r's, where r is any regular expression
r+
one or more r's
r? zero or one r's (that is, "an optional r")
r{2,5}
anywhere from two to five r's
r{2,}
two or more r's
r{4}
exactly 4 r's
{name}
the expansion of the name definition (see above)
"[xyz]\"foo"
the literal string: `[xyz]"foo'
\X
if X is an `a', `b', `f', `n', `r', `t', or `v', then the ANSI C interpretation of `\X'. Otherwise, a literal `X' (used to escape operators such as `*')
\123
the character with octal value 123
\x2a
the character with hexadecimal value 2a
(r)
match an r; parentheses are used to override precedence (see below)
rs
the regular expression r followed by the regular expression s; called "concatenation"
r|s
either an r or an s
r/s
an r but only if it is followed by an s. The s is not part of the matched text. This type of pattern is called trailing context.
^r
an r, but only at the beginning of a line
r$
an r, but only at the end of a line. Equivalent to `r/\n'.
<s>r
an r, but only in start condition s (see below for discussion of start conditions)
<s1,s2,s3>r
same, but in any of start conditions s1, s2, or s3
<<EOF>>
an end-of-file
<s1,s2><<EOF>>
an end-of-file when in start condition s1 or s2

The regular expressions listed above are grouped according to precedence, from highest precedence at the top to lowest at the bottom. Those grouped together have equal precedence. For example,

foo|bar*

is the same as

(foo)|(ba(r*))

since the `*' operator has higher precedence than concatenation, and concatenation higher than alternation (`|'). This pattern therefore matches either the string `foo' or the string `ba' followed by zero or more instances of `r'. To match `foo' or zero or more instances of `bar', use:

foo|(bar)*

and to match zero or more instances of either `foo' or `bar':

(foo|bar)*

Some notes on patterns:

The following are illegal:

foo/bar$
<sc1>foo<sc2>bar

You can write the first of these instead as `foo/bar\n'.

In the following examples, `$' and `^' are treated as normal characters:

foo|(bar$)
foo|^bar

If what you want to specify is "either `foo', or `bar' followed by a newline" you can use the following (the special `|' action is explained below):

foo      |
bar$     /* action goes here */

A similar trick will work for matching "either `foo', or `bar' at the beginning of a line."


Go to the first, previous, next, last section, table of contents.