RegEx
Home | CategoriesRegular Expressions are the basis for Jump Masks and may also be used in Find + Replace
- in Jump Masks they are not case sensitive
- in Find + Replace it depends on the setting of the checkbox 'Match Case'
Syntax
Escape char = \
it escapes the metacharacters:
$ . ^ { [ ( | ) * + ? \
- \. = match a dot
- \* = match a single asterisk
- \[ = match a bracket
| \b | blank = word boundary |
| \s | space (blank or tab) |
| \n | new line |
| \r | carriage return |
| \t | tab |
Wildcards
| . dot | any single character except newline: `r and `n | ||
| c.t | matches cat, cbt, cct, c1t, c2t, cAt, cBt | ||
| t...s | matches teens, trees, trams, but not Teens, trucks | ||
| .* | matches everything | ||
| Bracketing | |||
| [0-3] | matches any one digit from 0 through 3 | ||
| [a\-z] | matches 'a', 'z' and '-' | ||
| b[aiu]t |
matches "but", "bat", "bit", but not "bait"
|
||
| order doesn't matter: b[aiu]t is same as b[uia]t, b[iua]t and b[aui]t | |||
| [D-Qc-f1-5] | matches any one character in the range D to Q or any one character in the range c to f or any one digit in the range 1 to 5 | ||
| Negation = will match only if it does not contain the expression that follows | |||
| [^a-e] | matches "s" in "basketball" | ||
| [^a-zA-Z]{4} | matches 1234 and $.25 and #77; etc | ||
| ? |
Matches the term to its left zero or 1 times see also non-greedy
| |
| [hc]at | matches "hat" and "cat" | |
| [hc]?at | matches "hat", "cat", and "at" | |
| colou?r | matches "color" and "colour" | |
| foob.?r | matches strings like 'foobar', 'foobbr' and 'foobr' but not 'foobalkj9r' | |
| A[0-9]?4 | matches A4 and A24, but not A254 | |
| a\s?b | matches "ab" or "a b" | |
| * | Matches the term to its left zero or more times | |
| [hc]*at | matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", "at" etc. | |
| 6[2-4]* | matches 6, 62, 622, 624, 632, 644424, but not 8, 22, 65, 6135 | |
| + | Matches the term to its left one or more times | |
| [hc]+at |
matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", but not "at"
| |
| [A-Za-z]+ | matches any word of any length, provided it contains no digits | |
| [0-9]+ | matches 1, 13, 666, 93615 etc | |
| foob.+r | matches strings like 'foobar', 'foobalkjdflkj9r' but not 'foobr' | |
Curly Braces: {Min,Max} number of times to match:
- {n} matches n times exactly
equivalent to {n,n}
- {n,} matches n times or more = at least n times
- {n,m} matches n times at least, but not more than m times
- If a curly bracket occurs in any other context, it is treated as a regular character
| A{2,}{3,5} | matches one or two A, followed by 3-5 characters |
| (stef{2}en) | matches 'steffen' |
| fooba{2}r | matches 'foobaar' |
| fooba{2,}r | matches strings like 'foobaar', 'foobaaar', 'foobaaaar' etc. |
| fooba{2,3}r | matches strings like 'foobaar', or 'foobaaar' but not 'foobaaaar' |
| \d{5} | matches 5 digits |
| \s{2,} | matches at least 2 space characters |
| \d{2,3} | matches at least 2 but no more than 3 digits |
Parentheses
| (ab)?(c) | matches "abc" or "c" |
| (ab)|(cd)|(ef) | matches "ab" or "cd" or "ef" |
| Quantifiers that immediately follow the group apply to the whole group: | |
| (abc){2,3} | matches abcabc and abcabcabc , but not "abc" or "abccc" |
| (\w+)\s+\1 | matches any word that occurs twice in a row, such as "hubba hubba." |
Anchors
Unless anchored, wildcards will search anywhere: start, middle, end
^([^\n\r]+)$ matches a whole line
| ^ | Start of the input string. | |
| ^bat | matches strings beginning with "bat" | |
| ^[bat] | matches strings that begin with either "b", "a", or "t" | |
| ^[^BAT] | matches strings that do not begin with either B, A, or T | |
| ^[hc]at | matches "hat" and "cat", but only at the beginning of the string/line | |
| ^ | Caret immediately following the left bracket = excludes the remaining characters within brackets | |
| [^0-9] | not a digit. | |
| [^abc] | all characters except these | |
| [^K-Q] | all except these | |
| [^a-e] | matches "s" in "basketball" | |
| $ | End of the string. | |
| abc$ | will match the sub-string “abc” only if it is at the end of the string. | |
| [hc]at$ | matches "hat" and "cat", but only at the end of the string/line | |
| keep in mind that most strings/files will end on a period or exclamation/question mark. You must include those as well | ||
| $$ | End of File - very useful to append stuff at the end of text files | |
\b = word boundary = a spot between two characters that has
- a \w on one side of it and
- a \W on the other side of it (in either order)
\w = word, \W = non-word
- counting the imaginary characters off the beginning and end of the string as matching a \W
- ly\b matches "ly" in "possibly tomorrow"
Greedy vs Non-Greedy
Without getting into too much detail
- greedy mode is most often NOT the way to go. It starts at the beginning of the RegEx string and picks the end, with everything in between
- non-greedy or lazy mode often provides a better, more relevant search
| GREEDY | NON-GREEDY = with an extra ? | |||
| <abc>Hello world!<def> | <abc>Hello world!<def> | |||
| <.*> | returns <abc>Hello world!<def> | <.*?> | returns <abc> | |
| abbbbc | abbbbc | |||
| b+ | returns 'bbbb' | b+? | returns 'b' | |
| b* | returns 'bbbb' | b*? | returns empty string | |
| b{2,3} | returns 'bbb' | b{2,3}? | returns 'bb' | |
| ? | zero or one, similar to {0,1} | ?? | zero or one, similar to {0,1}? | |
| * | zero or more, similar to {0,} | *? | zero or more, similar to {0,}? | |
| + | one or more, similar to {1,} | +? | one or more, similar to {1,}? | |
| {n} | exactly n times | {n}? | exactly n times | |
| {n,} | at least n times | {n,}? | at least n times | |
| {n,m} | at least n but not more than m times | {n,m}? | at least n but not more than m times | |
More
There is much more to RegEx: Backreference, Swapping and replacing terms etc.
Tools to test-drive:
RegEx - Tester online: www.regextester.com/
| gskinner.com/RegExr/
RegEx-Coach to download: www.weitz.de/regex-coach/
Categories: What else?
Related topics: Find + Replace | Jump Masks
WriteMonkey version 2.4.0.6 | This helpfile last updated on Aug 29, 2012 --- Stefan Müller
but not "bait"