^ |
start of line |
$ |
end of line |
\A |
start of text |
\Z |
end of text |
. |
any character in line |
^userid |
matches string 'userid' only if it's at the beginning of line |
userid$ |
matches string 'userid' only if it's at the end of line |
^userid$ |
matches string 'userid' only if it's the only string in line |
user.d |
matches strings like 'userid', 'usercd', 'user6d' and so on |
\w |
an alphanumeric character (including "_") |
\W |
a nonalphanumeric |
\d |
a numeric character |
\D |
a non-numeric |
\s |
any space (same as [ \t\n\r\f]) |
\S |
a non space |
user\dd |
matches strings like 'user1d', ''user6d' etc. but not 'userad', 'userbd' etc. |
user[\w\s]d |
matches strings like 'userid', 'user d', 'usercd' etc. but not 'user6d', 'user=d' etc. |
Any item in a regular expression may be followed by iterators. Using these metacharacters you can specify the number of occurrences of a previous character, metacharacter or subexpression.
* |
zero or more, similar to {0,} |
+ |
one or more, similar to {1,} |
? |
zero or one, similar to {0,1} |
{n} |
exactly n times |
{n,} |
at least n times |
{n,m} |
at least n but not more than m times |
*? |
zero or more, similar to {0,}? |
+? |
one or more, similar to {1,}? |
?? |
zero or one, similar to {0,1}? |
{n}? |
exactly n times |
{n,}? |
at least n times |
{n,m}? |
at least n but not more than m times |
user.*d |
matches strings like 'userid', 'useralkjdflkj9d' and 'userd' |
user.+d |
matches strings like 'userid', 'useralkjdflkj9d' but not 'userd' |
user.?d |
matches strings like 'userid', 'userrid' and 'userd' but not 'useralkj9d' |
useri{2}d |
matches the string 'useriid' |
useri{2,}d |
matches strings like 'useriid', 'useriiid', 'useriiiid' etc. |
useri{2,3}d |
matches strings like 'useriid', or 'useriiid' but not 'useriiiid' |
You can specify a series of alternatives for a pattern using "|'' to separate them, so that fee|fie|foe will match any of "fee'', "fie'', or "foe'' in the target string (as would f(e|i|o)e). It is common practice to include alternatives in parentheses to minimize confusion about where they start and end.
user(id|user) |
matches strings 'userid' or 'useruser'. |
Subexpressions are numbered based on the left to right order of their opening parenthesis.
(userid){8,10} |
matches strings which contain 8, 9 or 10 instances of the 'userid' |
user([0-9]|a+)d |
matches 'user0d', 'user2d' , 'userid', 'useriid', 'useriiid' etc. |
Metacharacters \1 through \9 are interpreted as backreferences. \<n> matches a previously matched subexpression #<n>.
(.)\1+ |
matches 'aaaa' and 'cc' |
(.+)\1+ |
also matches 'abab' and '123123' |
(['"]?)(\d+)\1 |
matches '"13" (in double quotes), or '4' (in single quotes) or 77 (without quotes) etc. |
Syntax of Regular Expressions
Regular Expressions are used to specify patterns of text for searches.
Simple Matches
Single characters match themselves unless they are metacharacters with special meaning. Characters that normally function as metacharacters or escape sequences can be interpreted literally by preceding them with a backslash "\".
userid |
matches string 'userid' |
\^UserID |
matches '^UserID' |
Characters can be specified using an escape sequence syntax similar to that used in C and Perl.
Supported escape sequences
\xnn |
char with hex code nn |
\x{nnnn} |
char with hex code nnnn (one byte for plain text and two bytes for Unicode) |
\t |
tab (HT/TAB), same as \x09 |
\n |
newline (NL), same as \x0a |
\r |
car.return (CR), same as \x0d |
\f |
form feed (FF), same as \x0c |
\a |
alarm (bell) (BEL), same as \x07 |
\e |
escape (ESC), same as \x1b |
user\x20id |
matches 'user id' (note the space in the middle) |
\tuserid |
matches 'userid' predefined by tab |
Character Classes
You can specify a character class by enclosing a list of characters in [] which will match any one character from the list.
If the first character after the "['' is "^'', the class matches any character not in the list.
user[aeiou]d |
finds strings 'userad', 'usered' etc. but not 'userbd', 'usercd' etc. |
user[^aeiou]d |
find strings 'userbd', 'usercd' etc. but not 'userad', 'usered' etc. |
The "-'' character is used to specify a range within a list. If you want "-'' itself to be a member of a class, put it at the start or end of the list or escape it with a backslash.
[-az] |
matches 'a', 'z' and '-' |
[az-] |
matches 'a', 'z' and '-' |
[a\-z] |
matches 'a', 'z' and '-' |
[a-z] |
matches all twenty six small characters from 'a' to 'z' |
[\n-\x0D] |
matches any of the ASCII characters 10,11,12, or 13 |
[\d-t] |
matches the digits '-' and 't'. |
[]-a] |
matches any character from ']' to 'a'. |
Metacharacters are special characters which are the essence of Regular Expressions. The different types of metacharacters are described below.
Metacharacters - line separators
Metacharacters - predefined classes
Metacharacters - iterators
Metacharacters - alternatives
Metacharacters - subexpressions
Metacharacters - backreferences
Modifiers
Modifiers are used to change the behavior of regular expressions.
i |
Used for case-insensitive pattern matching. |
Treats a string as multiple lines. | |
Treats a string as a single line. | |
Used as a non-standard modifier. Switching it Off switches all following operators into non-greedy mode (by default this modifier is On). If modifier /g is Off then '+' works as '+?', '*' as '*?' etc. | |
Tells the regular expression to ignore whitespace that is neither backslashed nor within a character class. You can use this to break a regular expression into more readable parts. |