4.2 re -- Regular expression operations
This module provides regular expression matching operations similar to
those found in Perl. Regular expression pattern strings may not
contain null bytes, but can specify the null byte using the
\number
notation. Both patterns and strings to be
searched can be Unicode strings as well as 8-bit strings. The
re module is always available.
Regular expressions use the backslash character ("\") to
indicate special forms or to allow special characters to be used
without invoking their special meaning. This collides with Python's
usage of the same character for the same purpose in string literals;
for example, to match a literal backslash, one might have to write
'\\\\'
as the pattern string, because the regular expression
must be "\\", and each backslash must be expressed as
"\\" inside a regular Python string literal.
The solution is to use Python's raw string notation for regular
expression patterns; backslashes are not handled in any special way in
a string literal prefixed with "r". So r"\n"
is a
two-character string containing "\" and "n",
while "\n"
is a one-character string containing a newline.
Usually patterns will be expressed in Python code using this raw
string notation.
See Also:
- Book on regular expressions by Jeffrey Friedl, published by O'Reilly. The second edition of the book no longer covers Python at all, but the first edition covered writing good regular expression patterns in great detail.
See About this document... for information on suggesting changes.