Regular Expression Callouts

AutoHotkey

Regular Expression Callouts [AHK_L 14+]

Callouts provide a means of temporarily passing control to the script in the middle of regular expression pattern matching. For detailed information about the PCRE-standard callout feature, see pcre.txt.

Callouts are currently supported only by RegExMatch and RegExReplace.

Syntax

The syntax for a callout in AutoHotkey is (?CNumber:Function), where both Number and Function are optional. Colon ':' is allowed only if Function is specified, and is optional if Number is omitted. If Function is specified but is not the name of a user-defined function, a compile error occurs and pattern-matching does not begin.

If Function is omitted, the function name must be specified in a variable named pcre_callout. If both a global variable and local variable exist with this name, the local variable takes precedence. If pcre_callout does not contain the name of a user-defined function, callouts which omit Function are ignored.

Callout Functions

Function(Match, CalloutNumber, FoundPos, Haystack, NeedleRegEx)
{
    ...
}

Callout functions may define up to 5 parameters:

  • Match: Equivalent to the UnquotedOutputVar of RegExMatch, including the creation of array variables if appropriate.
  • CalloutNumber: Receives the Number of the callout.
  • FoundPos: Receives the position of the current potential match.
  • Haystack: Receives the Haystack passed to RegExMatch or RegExReplace.
  • NeedleRegEx: Receives the NeedleRegEx passed to RegExMatch or RegExReplace.

These names are suggestive only. Actual names may vary.

Pattern-matching may proceed or fail depending on the return value of the callout function:

  • If the function returns 0 or does not return a numeric value, matching proceeds as normal.
  • If the function returns 1 or greater, matching fails at the current point, but the testing of other matching possibilities goes ahead.
  • If the function returns -1, matching is abandoned.
  • If the function returns a value less than -1, it is treated as a PCRE error code and matching is abandoned. RegExMatch returns a blank string, while RegExReplace returns the original Haystack. In either case, ErrorLevel contains the error code.

For example:

Haystack = The quick brown fox jumps over the lazy dog.
RegExMatch(Haystack, "i)(The) (\w+)\b(?CCallout)")
Callout(m) {
    MsgBox m=%m%`nm1=%m1%`nm2=%m2%
    return 1
}

In the above example, Func is called once for each substring which matches the part of the pattern preceding the callout. \b is used to exclude incomplete words in matches such as The quic, The qui, The qu, etc.

EventInfo

Additional information is available by accessing the pcre_callout_block structure via A_EventInfo.

version           := NumGet(A_EventInfo,  0, "Int")
callout_number    := NumGet(A_EventInfo,  4, "Int")
offset_vector     := NumGet(A_EventInfo,  8)
subject           := NumGet(A_EventInfo,  8 + A_PtrSize)
subject_length    := NumGet(A_EventInfo,  8 + A_PtrSize*2, "Int")
start_match       := NumGet(A_EventInfo, 12 + A_PtrSize*2, "Int")
current_position  := NumGet(A_EventInfo, 16 + A_PtrSize*2, "Int")
capture_top       := NumGet(A_EventInfo, 20 + A_PtrSize*2, "Int")
capture_last      := NumGet(A_EventInfo, 24 + A_PtrSize*2, "Int")
pad := A_PtrSize=8 ? 4 : 0  ; Compensate for 64-bit data alignment.
callout_data      := NumGet(A_EventInfo, 28 + pad + A_PtrSize*2)
pattern_position  := NumGet(A_EventInfo, 28 + pad + A_PtrSize*3, "Int")
next_item_length  := NumGet(A_EventInfo, 32 + pad + A_PtrSize*3, "Int")
if version >= 2
    mark   := StrGet(NumGet(A_EventInfo, 36 + pad + A_PtrSize*3, "Int"), "UTF-8")

For more information, see pcre.txt, NumGet and A_PtrSize.

Auto-Callout

Including C in the options of the pattern enables the auto-callout mode. In this mode, callouts equivalent to (?C255) are inserted before each item in the pattern. For example, the following template may be used to debug regular expressions:

; Set the default callout function.
pcre_callout = DebugRegEx

; Call RegExMatch with auto-callout option C.
RegExMatch("xxxabc123xyz", "C)abc.*xyz")

DebugRegEx(Match, CalloutNumber, FoundPos, Haystack, NeedleRegEx)
{
    ; See pcre.txt for descriptions of these fields.
    start_match       := NumGet(A_EventInfo, 12 + A_PtrSize*2, "Int")
    current_position  := NumGet(A_EventInfo, 16 + A_PtrSize*2, "Int")
    pad := A_PtrSize=8 ? 4 : 0
    pattern_position  := NumGet(A_EventInfo, 28 + pad + A_PtrSize*3, "Int")
    next_item_length  := NumGet(A_EventInfo, 32 + pad + A_PtrSize*3, "Int")

    ; Point out >>current match<<.
    _HAYSTACK:=SubStr(Haystack, 1, start_match)
        . ">>" SubStr(Haystack, start_match + 1, current_position - start_match)
        . "<<" SubStr(Haystack, current_position + 1)
    
    ; Point out >>next item to be evaluated<<.
    _NEEDLE:=  SubStr(NeedleRegEx, 1, pattern_position)
        . ">>" SubStr(NeedleRegEx, pattern_position + 1, next_item_length)
        . "<<" SubStr(NeedleRegEx, pattern_position + 1 + next_item_length)
    
    ListVars
    ; Press Pause to continue.
    Pause
}

Remarks

Callouts are executed on the current quasi-thread, but the previous value of A_EventInfo will be restored after the callout function returns. ErrorLevel is not set until immediately before RegExMatch or RegExReplace returns.

PCRE is optimized to abort early in some cases if it can determine that a match is not possible. For all callouts to be called in such cases, it may be necessary to disable these optimizations by specifying (*NO_START_OPT) at the start of the pattern. This requires [v1.1.05] or later.