Tools/Search.../Regular Expression Primer

Explorer++

Tools/Search.../Regular Expression Primer

Explorer++'s Search tool can search for files and folders using regular expressions, but this concept might be foreign to some users.  This primer hopes to shed enough light on the subject to allow you to use regular expressions - at least in a simple form - in the Search too.  For more in-depth treatment of this subject, see other resources; the Internet has volumes of information available.  Note that because only file and folder names are being searched for, regular expressions which involve multiple lines and replacing text are not used.

    Caution:  Be mindful of what you read on the Internet; regular Expressions are somewhat platform/application specific.  While common behaviours exist, some platforms/applications introduce syntax which may not work in Explorer++.  For the most part, the Search dialog uses a subset of TR1 Regular Expressions.

 

What are Regular Expressions?

Regular expressions are a concise and flexible notation for finding and replacing patterns of text - the replacing part won't be covered here, as the nature of Search relies on finding instead.  Regular expressions are written in text (letters, numbers and special symbols) using 2 distinct categories:

  • literals - these are characters (or groups of characters) which are to be found in the target string as-is.  Any characters which are not used as meta-characters can be considered as literals; and
  • meta-characters - special characters which do not represent themselves, but are a coded representation of text sequences to find.

Meta-characters generally consist of the following characters: ˆ $ . * \ [ ] { } ( ) + ? |

Some other characters might also be considered meta-characters when used in a special context.  Meta-characters may also be used as literals (with a special syntax) if needed.  Unfortunately, implementations of regular expression handlers do vary in the language (based on the application and operating system), but the basic set of instructions should be the same.  For the purposes of Search, we will only consider basic searching.

 

Do I need to know Regular Expressions?

The short answer is probably not - since with Explorer++ we are only searching for file and folder names, likely the conventional method of searching (limited wildcards, text) may be sufficient.  But regular expressions can accommodate that weird search that you might need to do and can also be used to combine what might otherwise have been multiple searches into a single search.  Many other applications provide regular expression capability, particularly for searching text within files and search-and-replace situations.  Once you learn regular expressions, you may never go back to conventional searching.

 

How do Regular Expressions work?

The regular expression is a set of mini-rules, describing what is to be found in the target string.  The mini-rules are applied to the string in sequence - left to right - and if the described text is found, a pattern match is accomplished (success).  Once a match is made to an element (mini-rule), the text - in the target string - that caused the match is considered thrown away, and is not used again..  If any of the rules fail - that is, the anticipated pattern of text is not found - then the whole regular expression is considered to have failed (no pattern match).  In Explorer++'s Search tool, if a pattern match is made (subject to a match also with file attributes and case as set in the dialog), the file is listed in the results pane.  Also, in this context (matching file/folder names), the regular expression must describe the entire file/folder name, minus the path, of course.  This is not as difficult as it sounds since regular expressions have wildcards, repeats, etc. to accommodate unknown characters.  Some other aspects of regular expressions - handling multiple lines, replacing text, etc. are not applicable and are not covered here.

 

Elements (or Mini-rules) - short list

The following table contains a few meta-characters and elements to demonstrate the power of Regular Expressions.  Refer to the Regular Expression Reference (Appendix C) for more elements and details.

EXPRESSION SYNTAX ORDINARY NAME DESCRIPTION EXAMPLES
Any character . dot or period This is the wildcard character - matches any single character (except a newline - not used in file/folder names) s.s matches sys (system) and ses (session),
but not sores
Zero or more * asterisk matches zero or more occurrences of the preceding expression, and makes all possible matches a*b matches b (bat) and ab (about)
.* matches any sequence of characters
One or more + plus matches at least one occurrence of the preceding expression, same as {1,} (see Repetition below) rol+ matches rol and rolllll, but not ro
Or | pipe, vertical line matches either the expression before or the one after the OR symbol (|). Mostly used in a group. AL|TE matches ALE or ATE
Group () parentheses isolates an Or expression a(jpg|jpeg) matches ajpg or ajpeg
Escape \ backslash when it precedes a meta-character, the combination is taken as a literal.  Some meta-characters, namely ˆ$.[]{}()+, may be used in file/folder names. a\.txt matches a.txt

 

Demonstrations

  • Navigate to the C:\Windows folder.  Your drive letter may be different, depending on your installation.
  • Open the Search dialog (Ctrl+F)
  • Clear all attribute checkboxes and the Search Subfolders checkbox.  Check the Case Insensitive checkbox.  Check the Use Regular Expressions checkbox.
  1. Use   .*\.(ini|log)   as the Filename:, and do a search.  What did it do?  It should have returned all ini and log files; here's a breakdown of what happened...
.+ This is the wildcard (any character) followed by "one of more repeats".  This matches any character (and at least one!) until...
\. This is the meta-character "." but escaped; this considered the "." as just a dot/period (literal).  This matches the dot between the file name and the file extension.
(ini|log) This is a group (parentheses), just to avoid any possible misinterpretations.  Inside the group is "ini" or "log"; in other words, match "ini" or "log" as the extension (or at least, the next characters).  If any files have characters after "ini" or "log", they won't match the expression.
  1. Use   .*\d+.*\.\D+   as the Filename:, and do a search.  This one is a little more complex - refer to Appendix C for a bit more detail on each element.  Here's a breakdown first...
.* This is the wildcard followed by "zero or more repeats".  This skips/matches y everything (if it exists) until the next element.
\d+ \d is a character class (shorthand notation for digits), repeated one or more times.  This matches 0-9 but nothing else (at least one digit).
.* Again, this skips/matches y everything (if it exists) until the next element.
\. The dot/period character (escaped) is match, just before the extension.
\D+ \D is a character class (shorthand for not a digit) - repeated at least once.  This matches the extension, but only if there are no digits.

So what did it find?

  • all files with at least one digit in the filename, but no digits in the extension.  And it must have an extension.  Try doing that with a conventional wildcard search!
  1. Here's one more demonstration, for keeners!  This time
    • Navigate to your Windows folder (usually C:\Windows).
    • Again, no attribute filtering and check Case Insensitive.  Check the Search Subfolders checkbox - we want to scan all of Windows subfolders (it is relatively quick!).
    • Use   \{[[:xdigit:]]{8}-([[:xdigit:]]{4}-){3}[[:xdigit:]]{12}\}
      as the Filename:, and do a search.  Don't type it, just copy from here (select it, Ctrl+C - careful that you don't pick up any leading spaces) and paste (Ctrl+V).  You will have to break it down yourself, if you want, but what did it find?
      • all folders in the Windows folder (or subfolder thereof) named with a GUID (Globally Unique IDentifier - see Wikipedia here).