Using Regular Expressions
Regular expressions are a common form of expressing pattern matching. The
most common forms of regular expressions are listed below. Note that the
quotation marks (") in the examples are meant to set off terms from the
rest of the text, and are not part of the examples.
- string
- A regular string of characters will match the same string of characters
in the item being searched. Thus you can search for all occurances of the
string "test" by using the regular expression "test". This will also match
lines with "testimony", "latest" and "intestine".
- start (^)
- This indicates "beginning of line" in a match. For example "^test"
matches all lines that begin with "test". Note that this must appear as the
left most character to work in this manner.
- end ($)
- This indicates "end of line". The regular expression "test$" will match
those lines that end with "test", and "^test$" will match those lines that
contain only "test". Note that to word as the end of line, the "$" must be
the last character in the expression.
- single character (.)
- The period will match any character. Matches for "t.st" would include
"test", "tast", "tZst", and "t1st".
- escape (\)
- The backslash can be used to "escape" special characters so that you
can match for them. For example, if you wanted to find those lines ending
with a period, the expression ".$" would show all lines that had at least
one character in them, so you'd need to escape the "." - "\.$". Similarly
to find those places with the two characters period and dollar sign next
to each other you'd need "\.\$". You also have to escape the backslash
to match it "\\".
- character set ([ ])
- This represents a set of characters to compare against a single character.
Ranges of characters can be represented by the first character in the series,
followed by a hyphen, then the last character in the series.
For example, [abcdefg] is the same as [a-g]. The pattern "t[a-g123]st" would
match "tast", "test", and "t2st", but not "t-st", "taast" nor "tAst".
(Note that the special characters "\", "]", "-" and "^" must be escaped. See
the escape character (\) above. The special character "]" can appear as the
first character in the range without being escaped.)
- complemented character set ([^ ])
- The special character "^" placed at the beginning of a character set
indicates that the character set must not match the signle character. For
example, "t[^a-zA-Z]st" matches "t1st", "t-st", or "t,st", but not "test"
or "tYst".
- grouping (( ))
- You can use parentheses to group expressions. This is useful when you
want to use a string as basis for a pattern. E.g. "(test)+" will match
multiple instances of "test".
- alternation (|)
- This is an "or" operator. It allows the extension of a pattern to include
alternative matches. For example, to match those lines beginning with "Page"
or ending with "Foot", you could use the expression "^Page|Foot$".
- replication (*)
- When a character pattern is followed by a "*", the pattern will match
zero or more instances of that pattern. For example, the
pattern "te*st" will match "tst", "test", "teeeeeest", and so on.
- replication (+)
- This modifier works the same way as "*", but it matches at least
one instance of the pattern. For example, "t(es)+t" will match
"test", "tesest", etc., but not "tt".
- replication (?)
- The question mark is used in a similar fashion to the two previous
patterns, but matches exactly one or zero matches. Thus,
"te?st" will matter either "tst" or "test".
- replication ({m,n})
- For more rigid replicated pattern matches, the braces can be used to
indicate an exact number, a minimum number, or a range of numbers of
replications. To match exactly "m" copies, use the form {m} - m is a
number between 1 and 255. To match a minimum of "m" copies, use the form
"{m,}". To match between "m" and "n" copies use the form "{m,n}" ("n" must
be between "m" and 255). For example "a{2}" matches "aa", "b{1,4}" matches
"b", "bb", "bbb" and "bbbb".