RegEx
Home | CategoriesRegular Expressions are the basis for Jump Masks and may also be used in Find + Replace
- in Jump Masks they are not case sensitive
- in Find + Replace it depends on the setting of the checkbox 'Match Case'
Syntax
Escape char = \ it escapes the metacharacters: $ . ^ { [ ( | ) * + ? \
- \. = match a dot
- \* = match a single asterisk
- \[ = match a bracket
\b | blank = word boundary |
\s | space (blank or tab) |
\n | new line |
\r | carriage return |
\t | tab |
Wildcards
. dot | any single character except newline: `r and `n | ||
c.t | matches cat, cbt, cct, c1t, c2t, cAt, cBt | ||
t...s | matches teens, trees, trams, but not Teens, trucks | ||
.* | matches everything | ||
Bracketing | |||
[0-3] | matches any one digit from 0 through 3 | ||
[a\-z] | matches 'a', 'z' and '-' | ||
b[aiu]t | matches "but", "bat", "bit", but not "bait" | ||
order doesn't matter: b[aiu]t is same as b[uia]t, b[iua]t and b[aui]t | |||
[D-Qc-f1-5] | matches any one character in the range D to Q or any one character in the range c to f or any one digit in the range 1 to 5 | ||
Negation = will match only if it does not contain the expression that follows | |||
[^a-e] | matches "s" in "basketball" | ||
[^a-zA-Z]{4} | matches 1234 and $.25 and #77; etc |
? | Matches the term to its left zero or 1 times see also non-greedy | |
[hc]at | matches "hat" and "cat" | |
[hc]?at | matches "hat", "cat", and "at" | |
colou?r | matches "color" and "colour" | |
foob.?r | matches strings like 'foobar', 'foobbr' and 'foobr' but not 'foobalkj9r' | |
A[0-9]?4 | matches A4 and A24, but not A254 | |
a\s?b | matches "ab" or "a b" | |
* | Matches the term to its left zero or more times | |
[hc]*at | matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", "at" etc. | |
6[2-4]* | matches 6, 62, 622, 624, 632, 644424, but not 8, 22, 65, 6135 | |
+ | Matches the term to its left one or more times | |
[hc]+at | matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", but not "at" | |
[A-Za-z]+ | matches any word of any length, provided it contains no digits | |
[0-9]+ | matches 1, 13, 666, 93615 etc | |
foob.+r | matches strings like 'foobar', 'foobalkjdflkj9r' but not 'foobr' |
Curly Braces: {Min,Max} number of times to match:
- {n} matches n times exactly equivalent to {n,n}
- {n,} matches n times or more = at least n times
- {n,m} matches n times at least, but not more than m times
- If a curly bracket occurs in any other context, it is treated as a regular character
A{2,}{3,5} | matches one or two A, followed by 3-5 characters |
(stef{2}en) | matches 'steffen' |
fooba{2}r | matches 'foobaar' |
fooba{2,}r | matches strings like 'foobaar', 'foobaaar', 'foobaaaar' etc. |
fooba{2,3}r | matches strings like 'foobaar', or 'foobaaar' but not 'foobaaaar' |
\d{5} | matches 5 digits |
\s{2,} | matches at least 2 space characters |
\d{2,3} | matches at least 2 but no more than 3 digits |
Parentheses
(ab)?(c) | matches "abc" or "c" |
(ab)|(cd)|(ef) | matches "ab" or "cd" or "ef" |
Quantifiers that immediately follow the group apply to the whole group: | |
(abc){2,3} | matches abcabc and abcabcabc , but not "abc" or "abccc" |
(\w+)\s+\1 | matches any word that occurs twice in a row, such as "hubba hubba." |
Anchors
Unless anchored, wildcards will search anywhere: start, middle, end
^([^\n\r]+)$ matches a whole line
^ | Start of the input string. | |
^bat | matches strings beginning with "bat" | |
^[bat] | matches strings that begin with either "b", "a", or "t" | |
^[^BAT] | matches strings that do not begin with either B, A, or T | |
^[hc]at | matches "hat" and "cat", but only at the beginning of the string/line | |
^ | Caret immediately following the left bracket = excludes the remaining characters within brackets | |
[^0-9] | not a digit. | |
[^abc] | all characters except these | |
[^K-Q] | all except these | |
[^a-e] | matches "s" in "basketball" | |
$ | End of the string. | |
abc$ | will match the sub-string “abc” only if it is at the end of the string. | |
[hc]at$ | matches "hat" and "cat", but only at the end of the string/line | |
keep in mind that most strings/files will end on a period or exclamation/question mark. You must include those as well | ||
$$ | End of File - very useful to append stuff at the end of text files |
\b = word boundary = a spot between two characters that has
- a \w on one side of it and
- a \W on the other side of it (in either order) \w = word, \W = non-word
- counting the imaginary characters off the beginning and end of the string as matching a \W
- ly\b matches "ly" in "possibly tomorrow"
Greedy vs Non-Greedy
Without getting into too much detail
- greedy mode is most often NOT the way to go. It starts at the beginning of the RegEx string and picks the end, with everything in between
- non-greedy or lazy mode often provides a better, more relevant search
GREEDY | NON-GREEDY = with an extra ? | |||
<abc>Hello world!<def> | <abc>Hello world!<def> | |||
<.*> | returns <abc>Hello world!<def> | <.*?> | returns <abc> | |
abbbbc | abbbbc | |||
b+ | returns 'bbbb' | b+? | returns 'b' | |
b* | returns 'bbbb' | b*? | returns empty string | |
b{2,3} | returns 'bbb' | b{2,3}? | returns 'bb' | |
? | zero or one, similar to {0,1} | ?? | zero or one, similar to {0,1}? | |
* | zero or more, similar to {0,} | *? | zero or more, similar to {0,}? | |
+ | one or more, similar to {1,} | +? | one or more, similar to {1,}? | |
{n} | exactly n times | {n}? | exactly n times | |
{n,} | at least n times | {n,}? | at least n times | |
{n,m} | at least n but not more than m times | {n,m}? | at least n but not more than m times |
More
There is much more to RegEx: Backreference, Swapping and replacing terms etc.
Tools to test-drive:
RegEx - Tester online: www.regextester.com/ | gskinner.com/RegExr/
RegEx-Coach to download: www.weitz.de/regex-coach/
Categories: What else?
Related topics: Find + Replace | Jump Masks
WriteMonkey version 2.4.0.6 | This helpfile last updated on Aug 29, 2012 --- Stefan Müller