Logical Search

WinHex & X-Ways

Logical Search

 

Powerful subvariant of the simultaneous search. Allows to search either all files, all tagged, or (if invoked from the directory browser context menu) all selected files. The logical search has several advantages over a physical search:

 

+ File slack can be specifically targeted (for all files or, if only half checked, for files that are not omitted) or ignored.

 

+ The search scope can be limited to certain files and folders, through tagging or selecting files. Please note that the amount of data to search that may be displayed in the dialog window is an estimate only. The actual scope of the search may vary because of slack space.

 

+ Searching in files (usually = in the cluster chains allocated to files) will find search hits even if the search term happens to be physically split in a fragmented file (occurs at the end and the beginning of discontiguous clusters).

 

+ A logical search can be successful even in files that are compressed at the NTFS file system level, as they are decompressed for searching. This holds true even for files that were found via a file header signature search, if that was specially adapted for NTFS compression.

 

+ If the contents of archives (files in ZIP, RAR, GZ, TAR, BZ2, 7Z, and ARJ, if not encrypted, forensic license only) and individual e-mail messages and attachments have been included in the volume snapshot, they can be searched as well.

 

+ The text that is contained in files whose format is supported by the viewer componet, e.g. PDF (Adobe), WPD (Corel WordPerfect), VSD (Visio), SWF (Shockwave Flash), can automatically be extracted/decoded/decompressed prior to search, resulting in unformatted ASCII or UTF-16 plaintext, which can be reliably searched in addition to the original data itself. Search hits might otherwise be missed because various file types typically or at least sometimes store text in an encoded, encrypted, compressed, fragmented or otherwise garbled way. Important: In particular for HTML, XML and RTF documents as well as e-mail messages, which may employ various methods of encoding (e.g. UTF-8) non-7-bit-ASCII characters (e.g. German umlauts or Chinese characters), decoding may be useful, depending on the language of your search terms/the characters contained in your search terms. When you specify a file mask for decoding, that mask will not only be applied to the names of searched files, but also to their true type if verified by signature (see Refined Volume Snapshots). This feature requires the separate viewer component to be active for the decoding and text extraction part. The decoded text is output in Latin 1 or Unicode, and can optionally be buffered (cf. Options | Viewer Programs) to allow for a convenient context preview for search hits in the decoded text and to accelerate future searches. The default file mask for this option is *.pdf;*.docx;*.pptx;*.xlsx;*.odt;*.odp;*.ods;*.pages;*.key;*.numbers;*.eml;*.wpd;*.vsd. It is recommended to add ;*.html;*.xml;*.rtf depending on the characters searched for, and more depending on your requirements. For example *.doc might be a good idea if you want to be very thorough because text can be fragmented or change from one character set to another abruptly in the middle of a MS Word document. Just keep in mind that the additional decoding and search result require more time and likely result in duplicated search hits (search hits found in both the original format and the result of the text extraction). E-mails will generally not be decoded by X-Ways Forensics when only 7-bit ASCII characters are search. The file mask is applied to both the filename and the detected true file type. To see what text is extracted from a document by this function, you can select the document in the directory browser in Preview mode and hold the Shift key when switching to Raw mode.

 

+ If you are not interested in each and every search hit, but merely in which files contain at least one the specified searm terms, a logical search can be greatly accelerated by telling X-Ways Forensics that only one hit per file is needed, so that it can skip the remainder of a file once a hit has been recorded and continue with the next file. The resulting search hit list will be inherently and systematically incomplete, and no assumption must be made that somehow "the most useful" search hit in each file will be collected, or, if multiple search terms are used, a search hit for a search term that you consider more important will be collected. However, it is guaranteed that it contains all the files for which there was at least one hit (for one of the search terms used), and each such file once only. Such a list is sufficient (and efficient!) to manually review the affected files, comment on them, copy the files off an image or pass them on to other investigators in an evidence file container etc.

Note that of course it is not possible to combine search terms with a logical AND if only 1 hit per file was recorded. That consequence is typically forgotten by unsuspecting users.

 

+ Files that have been marked as irrelevant by hash computation and hash database matching or files that have been excluded by the user or that are filtered out by an active filter can be omitted from a logical search to save time and reduce the number of irrelevant search hits. The slack of such files is still covered if the option "Open and search files incl. slack" is fully checked, so that this option has a higher priority. If only half checked, the slack of such files is omitted, too.

 

+ The recommendable data reduction specifically omits certain files from the search to avoid that time is wasted or duplicate hits are produced unnecessarily.

E-mail archives of the types MBOX and DBX as well as file archives of the supported types (ZIP, RAR etc.) will not be searched if the e-mails and files that they contain have already been included in the volume snapshot, in order to save time. In that case only those extracted e-mails and files will be searched, in their natural (unencoded and uncompressed) state. This may be reasonable for keyword searches and in particular for indexing (which has a hard time processing e.g. Base64 code), but not necessarily for technical searches for signatures etc. Using this option constitutes a compromise. The slack of archive files is still included if the file slack option is enabled, as that option has a higher priority.

A file that that is marked as renamed/moved will not be searched either if data reduction is enabled and if principally all files in the volume are to be searched (as opposed to tagged or selected files only) because the same file will already be searched under its current name/in its current location.

If *.docx;*.pptx;*.xlsx;*.odt;*.odp;*.ods;*.pages;*.key;*.numbers are decoded for the search, the contained .xml files with the main contents (document.xml, content.xml, index.xml, ...) and in case of .pages any existing Preview.pdf are also omitted, to avoid redundant search hits.

Files with a red X icon will not be searched, except if they are specifically targeted via a selection or tagmark.

 

+ In NTFS, all "real" hard links (i.e. hard links other than SFN) except for one can be optionally omitted from logical searches and indexing. Nowadays on Windows installations often between 10,000 and 100,000 hard links of system files exist, for example 27 links to a file like "Ph3xIB64MV.dll" in directories such as

\Windows\System32\DriverStore\FileRepository\ph3xibc9.inf_amd64_neutral_ff3a566e4b6ba035

\Windows\System32\DriverStore\FileRepository\ph3xibc2.inf_amd64_neutral_7621f5d62d77f42e

\Windows\System32\DriverStore\FileRepository\ph3xibc5.inf_amd64_neutral_2270382453de2dbb

\Windows\winsxs\amd64_ph3xibc9.inf_31bf3856ad364e35_6.1.7600.16385_none_a0a14b454657e48e

\Windows\winsxs\amd64_ph3xibc5.inf_31bf3856ad364e35_6.1.7600.16385_none_9e7d0270e1def2ea

\Windows\winsxs\amd64_ph3xibc12.inf_31bf3856ad364e35_6.1.7600.16385_none_64d7af985f2a04e4

etc.

By searching only in one hard link of a file, you can typically exclude several GB of duplicate data and yet don't miss anything if you search all other files. Those additional hard links that are omitted are those whose hard link count is grayed out. Search hits in the only hard link that does get searched are marked with the hint "-> Links!" in the Descr. column to remind you of the other hard links of the same file in case those search hits are relevant.

 

* Option to apply logical simultaneous searches to various metadata of files in addition to the file contents. More precisely, they can be applied to the cells of any selected directory browser column such as Name, Author, Sender, Recipients or Metadata. That can spare you from pasting your keywords in the filter dialogs of various directory browser columns. That methodology is also more thorough because all the text addressed by this feature is searchable in UTF-16, whereas elsewhere the same data may be fragmented (e.g. filenames in particular in FAT), specially encoded (e.g. sender and recipients as quoted printable in e-mails), compressed, or stored in unexpected code pages. It is also convenient because any hits will be presented and listed in the same fashion as ordinary search hits in file contents, just specially marked in the search hit description column with the name of the column that the text that contains the search hits actually belongs to and highlighted in a different color. You can also filter for search hits in metadata.

 

When selecting a search hit in metadata, it is automatically searched for and highlighted in Details mode, just as ordinary search hits in file contents are automatically searched for and highlighted in Preview mode.

 

Note that the simultaneous search in metadata does not search in additional cell text that is displayed in a different color, such as alternative filenames and file counts in the Name column.

 

+ Some blind spots that logical searches have in old-fashioned computer forensics software products in the several thousand dollar price range do not exist in X-Ways Forensics, as such areas on a partition can be addressed specifically, namely any transition from file slack to directly following free space, and in NTFS and exFAT also from known uninitialized (but physically allocated) tails of files to directly following free space.

 

Search Options

 

Should this operation freeze on a certain file, remember the internal ID and the name of the currently processed file are displayed in the small progress indicator window. If this operation is applied to an evidence object and it crashes, X-Ways Forensics will tell you which file when you restart the program and associate it with a report table (depends on the Security Options). All that happens so that you can exclude and omit the file when trying again.

 

A parallelization option (currently still considered experimental) allows you to better utilize multiple processor cores by employing multiple threads. It has an effect only when searching in evidence objects that are images or directories, not disks. The faster your mass storage solution performs (in terms of seek times and data transfer speed), the more time you save percentage-wise. In perfect conditions, this can more than double the speed of logical searches. If you select just no extra threads for the logical search, it will work as in X-Ways Forensics versions before 18.9. If you select 1 or more extra threads, searching is done in additional worker threads, and the main thread of the process will be idle, which means the GUI will remain highly responsive. In X-Ways Investigator up to 2 worker threads may be used, in X-Ways Forensics up to 8, depending on the number of processor cores detected.