RegEx

WriteMonkey

RegEx

Home | Categories

Regular Expressions are the basis for Jump Masks and may also be used in Find + Replace

Syntax

Escape char = \ image it escapes the metacharacters: $ . ^ { [ ( | ) * + ? \

  • \. = match a dot
  • \* = match a single asterisk
  • \[ = match a bracket
White space
\b blank = word boundary
\s space (blank or tab)
\n new line
\r carriage return
\t tab

Wildcards

. dot any single character except newline: `r and `n
  c.t matches cat, cbt, cct, c1t, c2t, cAt, cBt
  t...s matches teens, trees, trams, but not Teens, trucks
  .* matches everything
Bracketing
  [0-3] matches any one digit from 0 through 3
  [a\-z] matches 'a', 'z' and '-'
  b[aiu]t matches "but", "bat", "bit", image but not "bait"
    order doesn't matter: b[aiu]t is same as b[uia]t, b[iua]t and b[aui]t
  [D-Qc-f1-5] matches any one character in the range D to Q or any one character in the range c to f or any one digit in the range 1 to 5
Negation = will match only if it does not contain the expression that follows
  [^a-e] matches "s" in "basketball"
  [^a-zA-Z]{4} matches 1234 and $.25 and #77; etc
? Matches the term to its left zero or 1 times image see also non-greedy
  [hc]at matches "hat" and "cat"
  [hc]?at matches "hat", "cat", and "at"
  colou?r matches "color" and "colour"
  foob.?r matches strings like 'foobar', 'foobbr' and 'foobr' but not 'foobalkj9r'
  A[0-9]?4 matches A4 and A24, but not A254
  a\s?b matches "ab" or "a b"
* Matches the term to its left zero or more times
  [hc]*at matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", "at" etc.
  6[2-4]* matches 6, 62, 622, 624, 632, 644424, but not 8, 22, 65, 6135
 
+ Matches the term to its left one or more times
  [hc]+at matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", image but not "at"
  [A-Za-z]+ matches any word of any length, provided it contains no digits
  [0-9]+ matches 1, 13, 666, 93615 etc
  foob.+r matches strings like 'foobar', 'foobalkjdflkj9r' but not 'foobr'

Curly Braces: {Min,Max} number of times to match:

  • {n} matches n times exactly image equivalent to {n,n}
  • {n,} matches n times or more = at least n times
  • {n,m} matches n times at least, but not more than m times
  • If a curly bracket occurs in any other context, it is treated as a regular character
A{2,}{3,5} matches one or two A, followed by 3-5 characters
(stef{2}en) matches 'steffen'
fooba{2}r matches 'foobaar'
fooba{2,}r matches strings like 'foobaar', 'foobaaar', 'foobaaaar' etc.
fooba{2,3}r matches strings like 'foobaar', or 'foobaaar' but not 'foobaaaar'
\d{5} matches 5 digits
\s{2,} matches at least 2 space characters
\d{2,3} matches at least 2 but no more than 3 digits

Parentheses

(ab)?(c) matches "abc" or "c"
(ab)|(cd)|(ef) matches "ab" or "cd" or "ef"
   
Quantifiers that immediately follow the group apply to the whole group:
(abc){2,3} matches abcabc and abcabcabc , but not "abc" or "abccc"
(\w+)\s+\1 matches any word that occurs twice in a row, such as "hubba hubba."

Anchors

Unless anchored, wildcards will search anywhere: start, middle, end

^([^\n\r]+)$ matches a whole line

^ Start of the input string.
  ^bat matches strings beginning with "bat"
  ^[bat] matches strings that begin with either "b", "a", or "t"
  ^[^BAT] matches strings that do not begin with either B, A, or T
  ^[hc]at matches "hat" and "cat", but only at the beginning of the string/line
^ Caret immediately following the left bracket = excludes the remaining characters within brackets
  [^0-9] not a digit.
  [^abc] all characters except these
  [^K-Q] all except these
  [^a-e] matches "s" in "basketball"
$ End of the string.
  abc$ will match the sub-string “abc” only if it is at the end of the string.
  [hc]at$ matches "hat" and "cat", but only at the end of the string/line
  keep in mind that most strings/files will end on a period or exclamation/question mark. You must include those as well
$$ End of File - very useful to append stuff at the end of text files

IMG0000001D.JPG

\b = word boundary = a spot between two characters that has

  • a \w on one side of it and
  • a \W on the other side of it (in either order) image \w = word, \W = non-word
  • counting the imaginary characters off the beginning and end of the string as matching a \W
  • ly\b matches "ly" in "possibly tomorrow"

Greedy vs Non-Greedy

Without getting into too much detail

  1. greedy mode is most often NOT the way to go. It starts at the beginning of the RegEx string and picks the end, with everything in between
  2. non-greedy or lazy mode often provides a better, more relevant search
GREEDY NON-GREEDY = with an extra ?
<abc>Hello world!<def> <abc>Hello world!<def>
<.*> returns <abc>Hello world!<def> <.*?> returns <abc>
         
abbbbc abbbbc
b+ returns 'bbbb' b+? returns 'b'
b* returns 'bbbb' b*? returns empty string
b{2,3} returns 'bbb' b{2,3}? returns 'bb'
         
? zero or one, similar to {0,1} ?? zero or one, similar to {0,1}?
* zero or more, similar to {0,} *? zero or more, similar to {0,}?
+ one or more, similar to {1,} +? one or more, similar to {1,}?
{n} exactly n times {n}? exactly n times
{n,} at least n times {n,}? at least n times
{n,m} at least n but not more than m times {n,m}? at least n but not more than m times

More

There is much more to RegEx: Backreference, Swapping and replacing terms etc.

Tools to test-drive:
RegEx - Tester online: www.regextester.com/ External | gskinner.com/RegExr/ External
RegEx-Coach to download: www.weitz.de/regex-coach/ External


WriteMonkey version 2.4.0.6 | This helpfile last updated on Aug 29, 2012 --- Stefan Müller