A regular expression is a powerful way of specifying a pattern for a complex search.
MySQL uses Henry Spencer's implementation of regular expressions,
which is aimed at conformance with POSIX 1003.2. See
Appendix C, Credits. MySQL uses the extended version to
support pattern-matching operations performed with the
REGEXP
operator in SQL statements. See
Section 3.3.4.7, “Pattern Matching”, and
Section 12.3.1, “String Comparison Functions”.
This appendix is a summary, with examples, of the special characters
and constructs that can be used in MySQL for
REGEXP
operations. It does not contain all the
details that can be found in Henry Spencer's
regex(7)
manual page. That manual page is
included in MySQL source distributions, in the
regex.7
file under the
regex
directory.
A regular expression describes a set of strings. The simplest
regular expression is one that has no special characters in it. For
example, the regular expression hello
matches
hello
and nothing else.
Non-trivial regular expressions use certain special constructs so
that they can match more than one string. For example, the regular
expression hello|word
matches either the string
hello
or the string word
.
As a more complex example, the regular expression
B[an]*s
matches any of the strings
Bananas
, Baaaaas
,
Bs
, and any other string starting with a
B
, ending with an s
, and
containing any number of a
or
n
characters in between.
A regular expression for the REGEXP
operator may
use any of the following special characters and constructs:
-
^
Match the beginning of a string.
mysql>
SELECT 'fo\nfo' REGEXP '^fo$';
-> 0 mysql>SELECT 'fofo' REGEXP '^fo';
-> 1 -
$
Match the end of a string.
mysql>
SELECT 'fo\no' REGEXP '^fo\no$';
-> 1 mysql>SELECT 'fo\no' REGEXP '^fo$';
-> 0 -
.
Match any character (including carriage return and newline).
mysql>
SELECT 'fofo' REGEXP '^f.*$';
-> 1 mysql>SELECT 'fo\r\nfo' REGEXP '^f.*$';
-> 1 -
a*
Match any sequence of zero or more
a
characters.mysql>
SELECT 'Ban' REGEXP '^Ba*n';
-> 1 mysql>SELECT 'Baaan' REGEXP '^Ba*n';
-> 1 mysql>SELECT 'Bn' REGEXP '^Ba*n';
-> 1 -
a+
Match any sequence of one or more
a
characters.mysql>
SELECT 'Ban' REGEXP '^Ba+n';
-> 1 mysql>SELECT 'Bn' REGEXP '^Ba+n';
-> 0 -
a?
Match either zero or one
a
character.mysql>
SELECT 'Bn' REGEXP '^Ba?n';
-> 1 mysql>SELECT 'Ban' REGEXP '^Ba?n';
-> 1 mysql>SELECT 'Baan' REGEXP '^Ba?n';
-> 0 -
de|abc
Match either of the sequences
de
orabc
.mysql>
SELECT 'pi' REGEXP 'pi|apa';
-> 1 mysql>SELECT 'axe' REGEXP 'pi|apa';
-> 0 mysql>SELECT 'apa' REGEXP 'pi|apa';
-> 1 mysql>SELECT 'apa' REGEXP '^(pi|apa)$';
-> 1 mysql>SELECT 'pi' REGEXP '^(pi|apa)$';
-> 1 mysql>SELECT 'pix' REGEXP '^(pi|apa)$';
-> 0 -
(abc)*
Match zero or more instances of the sequence
abc
.mysql>
SELECT 'pi' REGEXP '^(pi)*$';
-> 1 mysql>SELECT 'pip' REGEXP '^(pi)*$';
-> 0 mysql>SELECT 'pipi' REGEXP '^(pi)*$';
-> 1 -
{1}
,{2,3}
{n}
or{m,n}
notation provides a more general way of writing regular expressions that match many occurrences of the previous atom (or “piece”) of the pattern.m
andn
are integers.-
a*
Can be written as
a{0,}
. -
a+
Can be written as
a{1,}
. -
a?
Can be written as
a{0,1}
.
To be more precise,
a{n}
matches exactlyn
instances ofa
.a{n,}
matchesn
or more instances ofa
.a{m,n}
matchesm
throughn
instances ofa
, inclusive.m
andn
must be in the range from0
toRE_DUP_MAX
(default 255), inclusive. If bothm
andn
are given,m
must be less than or equal ton
.mysql>
SELECT 'abcde' REGEXP 'a[bcd]{2}e';
-> 0 mysql>SELECT 'abcde' REGEXP 'a[bcd]{3}e';
-> 1 mysql>SELECT 'abcde' REGEXP 'a[bcd]{1,10}e';
-> 1 -
-
[a-dX]
,[^a-dX]
Matches any character that is (or is not, if ^ is used) either
a
,b
,c
,d
orX
. A-
character between two other characters forms a range that matches all characters from the first character to the second. For example,[0-9]
matches any decimal digit. To include a literal]
character, it must immediately follow the opening bracket[
. To include a literal-
character, it must be written first or last. Any character that does not have a defined special meaning inside a[]
pair matches only itself.mysql>
SELECT 'aXbc' REGEXP '[a-dXYZ]';
-> 1 mysql>SELECT 'aXbc' REGEXP '^[a-dXYZ]$';
-> 0 mysql>SELECT 'aXbc' REGEXP '^[a-dXYZ]+$';
-> 1 mysql>SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$';
-> 0 mysql>SELECT 'gheis' REGEXP '^[^a-dXYZ]+$';
-> 1 mysql>SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$';
-> 0 -
[.characters.]
Within a bracket expression (written using
[
and]
), matches the sequence of characters of that collating element.characters
is either a single character or a character name likenewline
. You can find the full list of character names in theregexp/cname.h
file.mysql>
SELECT '~' REGEXP '[[.~.]]';
-> 1 mysql>SELECT '~' REGEXP '[[.tilde.]]';
-> 1 -
[=character_class=]
Within a bracket expression (written using
[
and]
),[=character_class=]
represents an equivalence class. It matches all characters with the same collation value, including itself. For example, ifo
and(+)
are the members of an equivalence class, then[[=o=]]
,[[=(+)=]]
, and[o(+)]
are all synonymous. An equivalence class may not be used as an endpoint of a range. -
[:character_class:]
Within a bracket expression (written using
[
and]
),[:character_class:]
represents a character class that matches all characters belonging to that class. The following table lists the standard class names. These names stand for the character classes defined in thectype(3)
manual page. A particular locale may provide other class names. A character class may not be used as an endpoint of a range.alnum
Alphanumeric characters alpha
Alphabetic characters blank
Whitespace characters cntrl
Control characters digit
Digit characters graph
Graphic characters lower
Lowercase alphabetic characters print
Graphic or space characters punct
Punctuation characters space
Space, tab, newline, and carriage return upper
Uppercase alphabetic characters xdigit
Hexadecimal digit characters mysql>
SELECT 'justalnums' REGEXP '[[:alnum:]]+';
-> 1 mysql>SELECT '!!' REGEXP '[[:alnum:]]+';
-> 0 -
[[:<:]]
,[[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the
alnum
class or an underscore (_
).mysql>
SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]';
-> 1 mysql>SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';
-> 0
To use a literal instance of a special character in a regular
expression, precede it by two backslash (\) characters. The MySQL
parser interprets one of the backslashes, and the regular expression
library interprets the other. For example, to match the string
1+2
that contains the special
+
character, only the last of the following
regular expressions is the correct one:
mysql>SELECT '1+2' REGEXP '1+2';
-> 0 mysql>SELECT '1+2' REGEXP '1\+2';
-> 0 mysql>SELECT '1+2' REGEXP '1\\+2';
-> 1