RegEx

previous page next page

Regular Expressions are the basis for Jump Masks and may also be used in Find + Replace

Escape char = \ it escapes the metacharacters: $ . ^ { [ ( | ) * + ? \

. dot	any single character except newline: `r and `n
	c.t	matches `cat, cbt, cct, c1t, c2t, cAt, cBt`
	t...s	matches teens, trees, trams, but not Teens, trucks
	.*	matches everything

Bracketing
	[0-3]	matches any one digit from 0 through 3
	[a\-z]	matches 'a', 'z' and '-'
	b[aiu]t	matches "but", "bat", "bit", but not "bait"
		order doesn't matter: b[aiu]t is same as `b[uia]t, b[iua]t and b[aui]t`
	[D-Qc-f1-5]	matches any one character in the range D to Q or any one character in the range c to f or any one digit in the range 1 to 5
Negation = will match only if it does not contain the expression that follows
	[^a-e]	matches "s" in "basketball"
	[^a-zA-Z]{4}	matches `1234` and `$.25` and `#77;` etc

?	Matches the term to its left zero or 1 times see also non-greedy
	[hc]at	matches "hat" and "cat"
	[hc]?at	matches "hat", "cat", and "at"
	colou?r	matches "color" and "colour"
	foob.?r	matches strings like 'foobar', 'foobbr' and 'foobr' but not 'foobalkj9r'
	A[0-9]?4	matches A4 and A24, but not A254
	a\s?b	matches "ab" or "a b"
*	Matches the term to its left zero or more times
	[hc]*at	matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", "at" etc.
	6[2-4]*	matches 6, 62, 622, 624, 632, 644424, but not 8, 22, 65, 6135

+	Matches the term to its left one or more times
	[hc]+at	matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", but not "at"
	[A-Za-z]+	matches any word of any length, provided it contains no digits
	[0-9]+	matches 1, 13, 666, 93615 etc
	foob.+r	matches strings like 'foobar', 'foobalkjdflkj9r' but not 'foobr'

Curly Braces: {Min,Max} number of times to match:

{n} matches n times exactly equivalent to {n,n}
{n,} matches n times or more = at least n times
{n,m} matches n times at least, but not more than m times
If a curly bracket occurs in any other context, it is treated as a regular character

A{2,}{3,5}	matches one or two A, followed by 3-5 characters
(stef{2}en)	matches 'steffen'
fooba{2}r	matches 'foobaar'
fooba{2,}r	matches strings like 'foobaar', 'foobaaar', 'foobaaaar' etc.
fooba{2,3}r	matches strings like 'foobaar', or 'foobaaar' but not 'foobaaaar'
\d{5}	matches 5 digits
\s{2,}	matches at least 2 space characters
\d{2,3}	matches at least 2 but no more than 3 digits

Parentheses

(ab)?(c)	matches "abc" or "c"
(ab)\|(cd)\|(ef)	matches "ab" or "cd" or "ef"

Quantifiers that immediately follow the group apply to the whole group:
(abc){2,3}	matches `abcabc` and `abcabcabc` , but not "abc" or "abccc"
(\w+)\s+\1	matches any word that occurs twice in a row, such as "hubba hubba."

Unless anchored, wildcards will search anywhere: start, middle, end

^([^\n\r]+)$ matches a whole line

^	Start of the input string.
	^bat	matches strings beginning with "bat"
	^[bat]	matches strings that begin with either "b", "a", or "t"
	^[^BAT]	matches strings that do not begin with either B, A, or T
	^[hc]at	matches "hat" and "cat", but only at the beginning of the string/line
^	Caret immediately following the left bracket = excludes the remaining characters within brackets
	[^0-9]	not a digit.
	[^abc]	all characters except these
	[^K-Q]	all except these
	[^a-e]	matches "s" in "basketball"
$	End of the string.
	abc$	will match the sub-string “abc” only if it is at the end of the string.
	[hc]at$	matches "hat" and "cat", but only at the end of the string/line
	keep in mind that most strings/files will end on a period or exclamation/question mark. You must include those as well
$$	End of File - very useful to append stuff at the end of text files

\b = word boundary = a spot between two characters that has

a \w on one side of it and
a \W on the other side of it (in either order) \w = word, \W = non-word
counting the imaginary characters off the beginning and end of the string as matching a \W
ly\b matches "ly" in "possibly tomorrow"

Without getting into too much detail

greedy mode is most often NOT the way to go. It starts at the beginning of the RegEx string and picks the end, with everything in between
non-greedy or lazy mode often provides a better, more relevant search

GREEDY		NON-GREEDY = with an extra ?
<abc>Hello world!<def>		<abc>Hello world!<def>
<.*>	returns <abc>Hello world!<def>	<.*?>	returns <abc>

abbbbc		abbbbc
b+	returns 'bbbb'	b+?	returns 'b'
b*	returns 'bbbb'	b*?	returns empty string
b{2,3}	returns 'bbb'	b{2,3}?	returns 'bb'

?	zero or one, similar to {0,1}	??	zero or one, similar to {0,1}?
*	zero or more, similar to {0,}	*?	zero or more, similar to {0,}?
+	one or more, similar to {1,}	+?	one or more, similar to {1,}?
{n}	exactly n times	{n}?	exactly n times
{n,}	at least n times	{n,}?	at least n times
{n,m}	at least n but not more than m times	{n,m}?	at least n but not more than m times

There is much more to RegEx: Backreference, Swapping and replacing terms etc.

Tools to test-drive:
RegEx - Tester online: www.regextester.com/ External | gskinner.com/RegExr/
RegEx-Coach to download: www.weitz.de/regex-coach/

Categories: What else?

WriteMonkey version 2.4.0.6 | This helpfile last updated on Aug 29, 2012 --- Stefan Müller

previous page start next page

More