Apéndice F. Expresiones regulares en MySQL

MySQL 5.0

Apéndice F. Expresiones regulares en MySQL

A regular expression is a powerful way of specifying a pattern for a complex search.

MySQL uses Henry Spencer's implementation of regular expressions, which is aimed at conformance with POSIX 1003.2. See Apéndice B, Credits. MySQL uses the extended version to support pattern-matching operations performed with the operator in SQL statements. See Sección 3.3.4.7, “Coincidencia de patrones”.

This appendix is a summary, with examples, of the special characters and constructs that can be used in MySQL for operations. It does not contain all the details that can be found in Henry Spencer's manual page. That manual page is included in MySQL source distributions, in the file under the directory.

A regular expression describes a set of strings. The simplest regular expression is one that has no special characters in it. For example, the regular expression matches and nothing else.

Non-trivial regular expressions use certain special constructs so that they can match more than one string. For example, the regular expression matches either the string or the string .

As a more complex example, the regular expression matches any of the strings , , , and any other string starting with a , ending with an , and containing any number of or characters in between.

A regular expression for the operator may use any of the following special characters and constructs:

  • Match the beginning of a string.

    mysql> SELECT 'fo\nfo' REGEXP '^fo$';                   -> 0
    mysql> SELECT 'fofo' REGEXP '^fo';                      -> 1
    

  • Match the end of a string.

    mysql> SELECT 'fo\no' REGEXP '^fo\no$';                 -> 1
    mysql> SELECT 'fo\no' REGEXP '^fo$';                    -> 0
    

  • Match any character (including carriage return and newline).

    mysql> SELECT 'fofo' REGEXP '^f.*$';                    -> 1
    mysql> SELECT 'fo\r\nfo' REGEXP '^f.*$';                -> 1
    

  • Match any sequence of zero or more characters.

    mysql> SELECT 'Ban' REGEXP '^Ba*n';                     -> 1
    mysql> SELECT 'Baaan' REGEXP '^Ba*n';                   -> 1
    mysql> SELECT 'Bn' REGEXP '^Ba*n';                      -> 1
    

  • Match any sequence of one or more characters.

    mysql> SELECT 'Ban' REGEXP '^Ba+n';                     -> 1
    mysql> SELECT 'Bn' REGEXP '^Ba+n';                      -> 0
    

  • Match either zero or one character.

    mysql> SELECT 'Bn' REGEXP '^Ba?n';                      -> 1
    mysql> SELECT 'Ban' REGEXP '^Ba?n';                     -> 1
    mysql> SELECT 'Baan' REGEXP '^Ba?n';                    -> 0
    

  • Match either of the sequences or .

    mysql> SELECT 'pi' REGEXP 'pi|apa';                     -> 1
    mysql> SELECT 'axe' REGEXP 'pi|apa';                    -> 0
    mysql> SELECT 'apa' REGEXP 'pi|apa';                    -> 1
    mysql> SELECT 'apa' REGEXP '^(pi|apa)$';                -> 1
    mysql> SELECT 'pi' REGEXP '^(pi|apa)$';                 -> 1
    mysql> SELECT 'pix' REGEXP '^(pi|apa)$';                -> 0
    

  • Match zero or more instances of the sequence .

    mysql> SELECT 'pi' REGEXP '^(pi)*$';                    -> 1
    mysql> SELECT 'pip' REGEXP '^(pi)*$';                   -> 0
    mysql> SELECT 'pipi' REGEXP '^(pi)*$';                  -> 1
    

  • ,

    or notation provides a more general way of writing regular expressions that match many occurrences of the previous atom (or “piece”) of the pattern. and are integers.

    • Can be written as .

    • Can be written as .

    • Can be written as .

    To be more precise, matches exactly instances of . matches or more instances of . matches through instances of , inclusive.

    and must be in the range from to (default 255), inclusive. If both and are given, must be less than or equal to .

    mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e';              -> 0
    mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e';              -> 1
    mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e';           -> 1
    

  • ,

    Matches any character that is (or is not, if ^ is used) either , , , or . A character between two other characters forms a range that matches all characters from the first character to the second. For example, matches any decimal digit. To include a literal character, it must immediately follow the opening bracket . To include a literal character, it must be written first or last. Any character that does not have a defined special meaning inside a pair matches only itself.

    mysql> SELECT 'aXbc' REGEXP '[a-dXYZ]';                 -> 1
    mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]$';               -> 0
    mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]+$';              -> 1
    mysql> SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$';             -> 0
    mysql> SELECT 'gheis' REGEXP '^[^a-dXYZ]+$';            -> 1
    mysql> SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$';           -> 0
    

  • Within a bracket expression (written using and ), matches the sequence of characters of that collating element. is either a single character or a character name like . You can find the full list of character names in the file.

    mysql> SELECT '~' REGEXP '[[.~.]]';                     -> 1
    mysql> SELECT '~' REGEXP '[[.tilde.]]';                 -> 1
    

  • Within a bracket expression (written using and ), represents an equivalence class. It matches all characters with the same collation value, including itself. For example, if and are the members of an equivalence class, then , , and are all synonymous. An equivalence class may not be used as an endpoint of a range.

  • Within a bracket expression (written using and ), represents a character class that matches all characters belonging to that class. The standard class names are:

    Alphanumeric characters
    Alphabetic characters
    Whitespace characters
    Control characters
    Digit characters
    Graphic characters
    Lowercase alphabetic characters
    Graphic or space characters
    Punctuation characters
    Space, tab, newline, and carriage return
    Uppercase alphabetic characters
    Hexadecimal digit characters

    These stand for the character classes defined in the manual page. A particular locale may provide other class names. A character class may not be used as an endpoint of a range.

    mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+';       -> 1
    mysql> SELECT '!!' REGEXP '[[:alnum:]]+';               -> 0
    

  • ,

    These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the class or an underscore ().

    mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]';   -> 1
    mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';  -> 0
    

To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string that contains the special character, only the last of the following regular expressions is the correct one:

mysql> SELECT '1+2' REGEXP '1+2';                       -> 0
mysql> SELECT '1+2' REGEXP '1\+2';                      -> 0
mysql> SELECT '1+2' REGEXP '1\\+2';                     -> 1