Appendix G. Regular Expressions
A regular expression is a powerful way of specifying a pattern for a complex
search.
MySQL uses Henry Spencer's implementation of regular expressions, which is
aimed at conformance with POSIX 1003.2. See
Appendix C, Credits.
MySQL uses the extended version to support pattern-matching operations
performed with the
REGEXP operator in SQL statements. See
Section 3.3.4.7, “Pattern Matching”, and
Section 12.3.1, “String Comparison Functions”.
This appendix is a summary, with examples, of the special characters and
constructs that can be used in MySQL for
REGEXP operations. It does not contain all the
details that can be found in Henry Spencer's
regex(7) manual page. That manual page is
included in MySQL source distributions, in the
regex.7 file under the
regex directory.
A regular expression describes a set of strings. The simplest regular
expression is one that has no special characters in it. For example, the
regular expression hello matches
hello and nothing else.
Non-trivial regular expressions use certain special constructs so that they
can match more than one string. For example, the regular expression
hello|word matches either the string
hello or the string word.
As a more complex example, the regular expression
B[an]*s matches any of the strings
Bananas, Baaaaas,
Bs, and any other string starting with a
B, ending with an s,
and containing any number of a or
n characters in between.
A regular expression for the REGEXP operator
may use any of the following special characters and constructs:
^
Match the beginning of a string.
mysql> SELECT 'fo\nfo' REGEXP '^fo$'; -> 0
mysql> SELECT 'fofo' REGEXP '^fo'; -> 1
$
Match the end of a string.
mysql> SELECT 'fo\no' REGEXP '^fo\no$'; -> 1
mysql> SELECT 'fo\no' REGEXP '^fo$'; -> 0
.
Match any character (including carriage return and newline).
mysql> SELECT 'fofo' REGEXP '^f.*$'; -> 1
mysql> SELECT 'fo\r\nfo' REGEXP '^f.*$'; -> 1
a*
Match any sequence of zero or more a
characters.
mysql> SELECT 'Ban' REGEXP '^Ba*n'; -> 1
mysql> SELECT 'Baaan' REGEXP '^Ba*n'; -> 1
mysql> SELECT 'Bn' REGEXP '^Ba*n'; -> 1
a+
Match any sequence of one or more a
characters.
mysql> SELECT 'Ban' REGEXP '^Ba+n'; -> 1
mysql> SELECT 'Bn' REGEXP '^Ba+n'; -> 0
a?
Match either zero or one a character.
mysql> SELECT 'Bn' REGEXP '^Ba?n'; -> 1
mysql> SELECT 'Ban' REGEXP '^Ba?n'; -> 1
mysql> SELECT 'Baan' REGEXP '^Ba?n'; -> 0
de|abc
Match either of the sequences de or
abc.
mysql> SELECT 'pi' REGEXP 'pi|apa'; -> 1
mysql> SELECT 'axe' REGEXP 'pi|apa'; -> 0
mysql> SELECT 'apa' REGEXP 'pi|apa'; -> 1
mysql> SELECT 'apa' REGEXP '^(pi|apa)$'; -> 1
mysql> SELECT 'pi' REGEXP '^(pi|apa)$'; -> 1
mysql> SELECT 'pix' REGEXP '^(pi|apa)$'; -> 0
(abc)*
Match zero or more instances of the sequence
abc.
mysql> SELECT 'pi' REGEXP '^(pi)*$'; -> 1
mysql> SELECT 'pip' REGEXP '^(pi)*$'; -> 0
mysql> SELECT 'pipi' REGEXP '^(pi)*$'; -> 1
{1}, {2,3}
{n} or {m,n}
notation provides a more general way of writing regular expressions that
match many occurrences of the previous atom (or “piece”)
of the pattern. m and
n are integers.
To be more precise, a{n} matches exactly
n instances of a.
a{n,} matches n
or more instances of a.
a{m,n}
matches m through n
instances of a, inclusive.
m and n must
be in the range from 0 to
RE_DUP_MAX
(default 255), inclusive. If both m and
n are given, m
must be less than or equal to n.
mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e'; -> 0
mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e'; -> 1
mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e'; -> 1
[a-dX], [^a-dX]
Matches any character that is (or is not, if ^ is used) either
a, b,
c, d or
X. A -
character between two other characters forms a range that matches all
characters from the first character to the second. For example,
[0-9] matches any decimal digit. To include
a literal ] character, it must immediately
follow the opening bracket [. To include a
literal - character, it must be written
first or last. Any character that does not have a defined special
meaning inside a [] pair matches only
itself.
mysql> SELECT 'aXbc' REGEXP '[a-dXYZ]'; -> 1
mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]$'; -> 0
mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]+$'; -> 1
mysql> SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$'; -> 0
mysql> SELECT 'gheis' REGEXP '^[^a-dXYZ]+$'; -> 1
mysql> SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$'; -> 0
[.characters.]
Within a bracket expression (written using [
and ]), matches the sequence of characters
of that collating element. characters is
either a single character or a character name like
newline. You can find the full list of
character names in the regexp/cname.h
file.
mysql> SELECT '~' REGEXP '[[.~.]]'; -> 1
mysql> SELECT '~' REGEXP '[[.tilde.]]'; -> 1
[=character_class=]
Within a bracket expression (written using [
and ]),
[=character_class=] represents an
equivalence class. It matches all characters with the same collation
value, including itself. For example, if o
and
(+) are the members of an equivalence
class, then [[=o=]],
[[=(+)=]], and [o(+)] are all
synonymous. An equivalence class may not be used as an endpoint of a
range.
[:character_class:]
Within a bracket expression (written using [
and ]),
[:character_class:] represents a character
class that matches all characters belonging to that class. The following
table lists the standard class names. These names stand for the
character classes defined in the
ctype(3) manual page. A particular locale
may provide other class names. A character class may not be used as an
endpoint of a range.
mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+'; -> 1
mysql> SELECT '!!' REGEXP '[[:alnum:]]+'; -> 0
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and
end of words, respectively. A word is a sequence of word characters that
is not preceded by or followed by word characters. A word character is
an alphanumeric character in the
alnum class or an underscore (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
To use a literal instance of a special character in a regular expression,
precede it by two backslash (\) characters. The MySQL parser interprets one
of the backslashes, and the regular expression library interprets the other.
For example, to match the string
1+2 that contains the special
+ character, only the last of the following
regular expressions is the correct one:
mysql> SELECT '1+2' REGEXP '1+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\\+2'; -> 1