RegExMatch()

Auto Hotkey

RegExMatch()

Determines whether a string contains a pattern (regular expression).

FoundPos := RegExMatch(Haystack, NeedleRegEx , OutputVar, StartingPosition := 1)

Parameters

Haystack

The string whose content is searched. This may contain binary zero.

NeedleRegEx

The pattern to search for, which is a Perl-compatible regular expression (PCRE). The pattern's options (if any) must be included at the beginning of the string followed by a close-parenthesis. For example, the pattern "i)abc.*123" would turn on the case-insensitive option and search for "abc", followed by zero or more occurrences of any character, followed by "123". If there are no options, the ")" is optional; for example, ")abc" is equivalent to "abc".

Although NeedleRegEx cannot contain binary zero, the pattern \x00 can be used to match a binary zero within Haystack.

OutputVar

OutputVar is the unquoted name of a variable in which to store a match object, which can be used to retrieve the position, length and value of the overall match and of each captured subpattern, if any are present.

If the pattern is not found (that is, if the function returns 0), this variable is made blank.

StartingPosition

If StartingPosition is omitted, it defaults to 1 (the beginning of Haystack). Otherwise, specify 2 to start at the second character, 3 to start at the third, and so on. If StartingPosition is beyond the length of Haystack, the search starts at the empty string that lies at the end of Haystack (which typically results in no match).

Specify a negative StartingPosition to start at that position from the right. For example, -1 starts at the last character and -2 starts at the next-to-last character. If StartingPosition tries to go beyond the left end of Haystack, all of Haystack is searched.

Regardless of the value of StartingPosition, the return value is always relative to the first character of Haystack. For example, the position of "abc" in "123abc789" is always 4.

Return Value

This function returns the position of the leftmost occurrence of NeedleRegEx in the string Haystack. Position 1 is the first character. Zero is returned if the pattern is not found.

Errors

Syntax errors: If the pattern contains a syntax error, an exception is thrown with a message in the following form: Compile error N at offset M: description. In that string, N is the PCRE error number, M is the position of the offending character inside the regular expression, and description is the text describing the error.

Execution errors: If an error occurs during the execution of the regular expression, an exception is thrown. The Extra property of the exception object contains the PCRE error number. Although such errors are rare, the ones most likely to occur are "too many possible empty-string matches" (-22), "recursion too deep" (-21), and "reached match limit" (-8). If these happen, try to redesign the pattern to be more restrictive, such as replacing each * with a ?, +, or a limit like {0,3} wherever feasible.

Options

See Options for modifiers such as "i)abc", which turns off case-sensitivity in the pattern "abc".

Match Object

If a match is found, an object containing information about the match is stored in OutputVar. This object has the following properties:

Match.Pos(N): Returns the position of the overall match or a captured subpattern.

Match.Len(N): Returns the length of the overall match or a captured subpattern.

Match.Value(N): Returns the overall match or a captured subpattern.

Match.Name(N): Returns the name of the given subpattern, if it has one.

Match.Count(): Returns the overall number of subpatterns.

Match.Mark(): Returns the NAME of the last encountered (*MARK:NAME), when applicable.

Match[N]: If N is 0 or a valid subpattern number or name, this is equivalent to Match.Value(N). Otherwise, N can be the name of one of the above properties. For example, Match["Pos"] and Match.Pos are equivalent to Match.Pos() unless a subpattern named "Pos" exists, in which case they are equivalent to Match.Value("Pos").

Match.N: Same as above, except that N is an unquoted name or number.

For all of the above properties, N can be any of the following:

  • 0 for the overall match.
  • The number of a subpattern, even one that also has a name.
  • The name of a subpattern.

Brackets [] may be used in place of parentheses () if N is specified.

The object does not support enumeration; that is, the for-loop is not supported. Instead, use Loop Match.Count().

Performance

To search for a simple substring inside a larger string, use InStr because it is faster than RegExMatch().

To improve performance, the 100 most recently used regular expressions are kept cached in memory (in compiled form).

The study option (S) can sometimes improve the performance of a regular expression that is used many times (such as in a loop).

Remarks

A subpattern may be given a name such as the word Year in the pattern "(?P<Year>\d{4})". Such names may consist of up to 32 alphanumeric characters and underscores. Note that named subpatterns are also numbered, so if an unnamed subpattern occurs after "Year", it would be stored in OutputVar[2], not OutputVar[1].

Most characters like abc123 can be used literally inside a regular expression. However, the characters \.*?+[{|()^$ must be preceded by a backslash to be seen as literal. For example, \. is a literal period and \\ is a literal backslash. Escaping can be avoided by using \Q...\E. For example: \QLiteral Text\E.

Within a regular expression, special characters such as tab and newline can be escaped with either an accent (`) or a backslash (\). For example, `t is the same as \t except when the x option is used.

To learn the basics of regular expressions (or refresh your memory of pattern syntax), see the RegEx Quick Reference.

AutoHotkey's regular expressions are implemented using Perl-compatible Regular Expressions (PCRE) from www.pcre.org.

Related

RegExReplace, RegEx Quick Reference, Regular Expression Callouts, InStr, SubStr, SetTitleMatchMode RegEx, Global matching and Grep (forum link)

Common sources of text data: FileRead, Download, Clipboard, GUI Edit controls

Examples

FoundPos := RegExMatch("xxxabc123xyz", "abc.*xyz")  ; Returns 4, which is the position where the match was found.
FoundPos := RegExMatch("abc123123", "123$")  ; Returns 7 because the $ requires the match to be at the end.
FoundPos := RegExMatch("abc123", "i)^ABC")  ; Returns 1 because a match was achieved via the case-insensitive option.
FoundPos := RegExMatch("abcXYZ123", "abc(.*)123", SubPat)  ; Returns 1 and stores "XYZ" in SubPat[1].
FoundPos := RegExMatch("abc123abc456", "abc\d+", "", 2)  ; Returns 7 instead of 1 due to StartingPosition 2 vs. 1.

; For general RegEx examples, see the RegEx Quick Reference.