Go to the first, previous, next, last section, table of contents.
The patterns in the input are written using an extended set of regular
expressions. These are:
x
- match the character `x'
.
- any character except newline
[xyz]
- a "character class"; in this case, the pattern matches either an
`x', a `y', or a `z'
[abj-oZ]
- a "character class" with a range in it; matches an `a', a
`b', any letter from `j' through `o', or a `Z'
[^A-Z]
- a "negated character class", i.e., any character but those in the
class. In this case, any character except an uppercase letter.
[^A-Z\n]
- any character except an uppercase letter or a newline
r*
- zero or more r's, where r is any regular expression
r+
- one or more r's
r? zero or one r's (that is, "an optional r")
-
r{2,5}
- anywhere from two to five r's
r{2,}
- two or more r's
r{4}
- exactly 4 r's
{name}
- the expansion of the name definition (see above)
"[xyz]\"foo"
- the literal string: `[xyz]"foo'
\X
- if X is an `a', `b', `f', `n', `r', `t',
or `v', then the ANSI C interpretation of `\X'.
Otherwise, a literal `X' (used to escape operators such as
`*')
\123
- the character with octal value
123
\x2a
- the character with hexadecimal value
2a
(r)
- match an r; parentheses are used to override precedence (see
below)
rs
- the regular expression r followed by the regular expression
s; called "concatenation"
r|s
- either an r or an s
r/s
- an r but only if it is followed by an s. The s is not
part of the matched text. This type of pattern is called trailing
context.
^r
- an r, but only at the beginning of a line
r$
- an r, but only at the end of a line. Equivalent to `r/\n'.
<s>r
- an r, but only in start condition s (see below for
discussion of start conditions)
<s1,s2,s3>r
- same, but in any of start conditions s1, s2, or s3
<<EOF>>
- an end-of-file
<s1,s2><<EOF>>
- an end-of-file when in start condition s1 or s2
The regular expressions listed above are grouped according to
precedence, from highest precedence at the top to lowest at the bottom.
Those grouped together have equal precedence. For example,
foo|bar*
is the same as
(foo)|(ba(r*))
since the `*' operator has higher precedence than concatenation,
and concatenation higher than alternation (`|'). This pattern
therefore matches either the string `foo' or the string `ba'
followed by zero or more instances of `r'. To match `foo' or
zero or more instances of `bar', use:
foo|(bar)*
and to match zero or more instances of either `foo' or `bar':
(foo|bar)*
Some notes on patterns:
-
A negated character class such as the example `[^A-Z]' above will
match a newline unless `\n' (or an equivalent escape sequence) is
one of the characters explicitly present in the negated character class
(e.g., `[^A-Z\n]'). This is unlike how many other regular
expression tools treat negated character classes, but unfortunately the
inconsistency is historically entrenched. Matching newlines means that
a pattern like `[^"]*' can match an entire input (overflowing the
scanner's input buffer) unless there's another quote in the input.
-
A rule can have at most one instance of trailing context (the `/'
operator or the `$' operator). The start condition, `^', and
`<<EOF>>' patterns can only occur at the beginning of a pattern,
and, as well as with `/' and `$', cannot be grouped inside
parentheses. A `^' which does not occur at the beginning of a rule
or a `$' which does not occur at the end of a rule loses its
special properties and is treated as a normal character.
The following are illegal:
foo/bar$
<sc1>foo<sc2>bar
You can write the first of these instead as `foo/bar\n'.
In the following examples, `$' and `^' are treated as
normal characters:
foo|(bar$)
foo|^bar
If what you want to specify is "either `foo', or `bar'
followed by a newline" you can use the following (the special `|'
action is explained below):
foo |
bar$ /* action goes here */
A similar trick will work for matching "either `foo', or `bar'
at the beginning of a line."
Go to the first, previous, next, last section, table of contents.