RegExMatch() [v1.0.45+]
Determines whether a string contains a pattern (regular expression).
FoundPos := RegExMatch(Haystack, NeedleRegEx , OutputVar, StartingPosition := 1)
Parameters
- Haystack
The string whose content is searched.
- NeedleRegEx
The pattern to search for, which is a Perl-compatible regular expression (PCRE). The pattern's options (if any) must be included at the beginning of the string followed by a close-parenthesis. For example, the pattern "i)abc.*123" would turn on the case-insensitive option and search for "abc", followed by zero or more occurrences of any character, followed by "123". If there are no options, the ")" is optional; for example, ")abc" is equivalent to "abc".
- OutputVar
Mode 1 (default): Specify a variable in which to store the part of Haystack that matched the entire pattern. If the pattern is not found (that is, if the function returns 0), this variable and all array elements below are made blank.
If any capturing subpatterns are present inside NeedleRegEx, their matches are stored in a pseudo-array whose base name is OutputVar. For example, if the variable's name is Match, the substring that matches the first subpattern would be stored in Match1, the second would be stored in Match2, and so on. The exception to this is named subpatterns: they are stored by name instead of number. For example, the substring that matches the named subpattern "(?P<Year>\d{4})" would be stored in MatchYear. If a particular subpattern does not match anything (or if the function returns zero), the corresponding variable is made blank.
Within a function, to create a pseudo-array that is global instead of local, declare the base name of the pseudo-array (e.g. Match) as a global variable prior to using it. The converse is true for assume-global functions. However, it is often also necessary to declare each element, due to a common source of confusion.
Mode 2 (position-and-length): If a capital P is present in the RegEx's options -- such as "P)abc.*123" -- the length of the entire-pattern match is stored in OutputVar (or 0 if no match). If any capturing subpatterns are present, their positions and lengths are stored in two pseudo-arrays: OutputVarPos and OutputVarLen. For example, if the variable's base name is Match, the one-based position of the first subpattern's match would be stored in MatchPos1, and its length in MatchLen1 (zero is stored in both if the subpattern was not matched or the function returns 0). The exception to this is named subpatterns: they are stored by name instead of number (e.g. MatchPosYear and MatchLenYear).
Mode 3 (match object) [v1.1.05+]: If a capital O is present in the RegEx's options -- such as "O)abc.*123" -- a match object is stored in OutputVar. This object can be used to retrieve the position, length and value of the overall match and of each captured subpattern, if present.
- StartingPosition
If StartingPosition is omitted, it defaults to 1 (the beginning of Haystack). Otherwise, specify 2 to start at the second character, 3 to start at the third, and so on. If StartingPosition is beyond the length of Haystack, the search starts at the empty string that lies at the end of Haystack (which typically results in no match).
If StartingPosition is less than 1, it is considered to be an offset from the end of Haystack. For example, 0 starts at the last character and -1 starts at the next-to-last character. If StartingPosition tries to go beyond the left end of Haystack, all of Haystack is searched.
Regardless of the value of StartingPosition, the return value is always relative to the first character of Haystack. For example, the position of "abc" in "123abc789" is always 4.
Return Value
This function returns the position of the leftmost occurrence of NeedleRegEx in the string Haystack. Position 1 is the first character. Zero is returned if the pattern is not found. If an error occurs (such as a syntax error inside NeedleRegEx), an empty string is returned and ErrorLevel is set to one of the values below instead of 0.
ErrorLevel
[v1.1.04+]: This function is able to throw an exception on failure (not to be confused with "no match found"). For more information, see Runtime Errors.
ErrorLevel is set to one of the following:
- 0, which means that no error occurred.
- A string in the following form: Compile error N at offset M: description. In that string, N is the PCRE error number, M is the position of the offending character inside the regular expression, and description is the text describing the error.
- A negative number, which means an error occurred during the execution of the regular expression. Although such errors are rare, the ones most likely to occur are "too many possible empty-string matches" (-22), "recursion too deep" (-21), and "reached match limit" (-8). If these happen, try to redesign the pattern to be more restrictive, such as replacing each * with a ?, +, or a limit like {0,3} wherever feasible.
Options
See Options for modifiers such as "i)abc", which turns off case-sensitivity in the pattern "abc".
Match Object [v1.1.05+]
If a capital O is present in the RegEx's options, a match object is stored in OutputVar. This object has the following properties:
Match.Pos(N): Returns the position of the overall match or a captured subpattern.
Match.Len(N): Returns the length of the overall match or a captured subpattern.
Match.Value(N): Returns the overall match or a captured subpattern.
Match.Name(N): Returns the name of the given subpattern, if it has one.
Match.Count(): Returns the overall number of subpatterns.
Match.Mark(): Returns the NAME of the last encountered (*MARK:NAME)
, when applicable.
Match[N]: If N is 0 or a valid subpattern number or name, this is equivalent to Match.Value(N)
. Otherwise, N can be the name of one of the above properties. For example, Match["Pos"]
and Match.Pos
are equivalent to Match.Pos()
unless a subpattern named "Pos" exists, in which case they are equivalent to Match.Value("Pos")
.
Match.N: Same as above, except that N is an unquoted name or number.
For all of the above properties, N can be any of the following:
- 0 for the overall match.
- The number of a subpattern, even one that also has a name.
- The name of a subpattern.
Brackets [] may be used in place of parentheses () if N is specified.
Performance
To search for a simple substring inside a larger string, use InStr() because it is faster than RegExMatch().
To improve performance, the 100 most recently used regular expressions are kept cached in memory (in compiled form).
The study option (S) can sometimes improve the performance of a regular expression that is used many times (such as in a loop).
Remarks
A subpattern may be given a name such as the word Year in the pattern "(?P<Year>\d{4})". Such names may consist of up to 32 alphanumeric characters and underscores. The following limitation does not apply to the "O" (match object) mode: Although named subpatterns are also available by their numbers during the RegEx operation itself (e.g. \1 is a backreference to the string that actually matched the first capturing subpattern), they are stored in the output pseudo-array only by name (not by number). For example, if "Year" is the first subpattern, OutputVarYear would be set to the matching substring, but OutputVar1 would not be changed at all (it would retain its previous value, if any). However, if an unnamed subpattern occurs after "Year", it would be stored in OutputVar2, not OutputVar1.
Most characters like abc123 can be used literally inside a regular expression. However, the characters \.*?+[{|()^$ must be preceded by a backslash to be seen as literal. For example, \. is a literal period and \\ is a literal backslash. Escaping can be avoided by using \Q...\E. For example: \QLiteral Text\E
.
Within a regular expression, special characters such as tab and newline can be escaped with either an accent (`) or a backslash (\). For example, `t is the same as \t except when the x option is used.
To learn the basics of regular expressions (or refresh your memory of pattern syntax), see the RegEx Quick Reference.
AutoHotkey's regular expressions are implemented using Perl-compatible Regular Expressions (PCRE) from www.pcre.org.
[AHK_L 31+]: Within an expression, the a ~= b
can be used as shorthand for RegExMatch(a, b)
.
Related
RegExReplace(), RegEx Quick Reference, Regular Expression Callouts, InStr(), IfInString, StringGetPos, SubStr(), SetTitleMatchMode RegEx, Global matching and Grep (forum link)
Common sources of text data: FileRead, UrlDownloadToFile, Clipboard, GUI Edit controls
Examples
FoundPos := RegExMatch("xxxabc123xyz", "abc.*xyz") ; Returns 4, which is the position where the match was found. FoundPos := RegExMatch("abc123123", "123$") ; Returns 7 because the $ requires the match to be at the end. FoundPos := RegExMatch("abc123", "i)^ABC") ; Returns 1 because a match was achieved via the case-insensitive option. FoundPos := RegExMatch("abcXYZ123", "abc(.*)123", SubPat) ; Returns 1 and stores "XYZ" in SubPat1. FoundPos := RegExMatch("abc123abc456", "abc\d+", "", 2) ; Returns 7 instead of 1 due to StartingPosition 2 vs. 1. ; For general RegEx examples, see the RegEx Quick Reference.