FileTypeDefinitions

WinHex & X-Ways

File Type Definitions

 

"File Type Signatures *.txt" are tab-delimited text files that serves as a file type definition database for refining volume snapshots and for the File Recovery by Type command.

 

WinHex comes with various preset file type signatures. You may fully customize the file type definitions and add your own ones, either in "File Type Signatures Search.txt" or in any additional such files of the same format named "File Type Signatures *.txt", which will be loaded as well and may have the benefit that they will not be overwritten when you install the next update if they don't have the same name as one of the default files. Only if the filename contains the word "search", the file types will be available for file header signature searches. Otherwise they are used for file type verification only of files that are already part of the volume snapshot (forensic license only). Up to 4096 entries are supported altogether (1024 for searching).

 

When you click the Customize button to edit the file "File Type Signatures Search.txt", by default WinHex opens the file in MS Excel. This is convenient because the file consists of columns separated by tabs. If you edit the file with a text editor, be sure to retain these tabs, as WinHex relies on their presence to properly interpret the file type definitions. MS Excel retains them automatically. After editing the file type definitions, you need to exit the dialog window and invoke the File Recovery by Type or Refine Volume Snapshot menu command again to see the changes in the file type list.

 

1st column: File Type

 

A human-readable designation of the file type, e.g. "JPEG". Everything beyond the first 19 characters is ignored.

 

2nd column: Extensions

 

One or more file type extensions typically used for this file type. E.g. "jpg;jpeg;jpe". Specify the most common extension first because that one will be used by default for naming recovered files. If that first extension is specified in upper-case characters, it will be used by the file type verification to fill the Type column for a file even if the file has one of the alternative plausible filename extensions. More than 255 characters supported.

 

3rd column: Header

 

A unique header signature by which files of this file type can be recognized. It is specified in GREP syntax (see Search Options for an explanation), so that it's possible to match variable byte values (e.g. [\xE1\xE2] mean "the byte value could be 0xE1 or 0xE2") or undefined areas (.). The maximum length of the represented signature is 48 bytes. To find out characteristic file header signatures in the first place, open several existing files of a certain type in WinHex and look for common byte values near the beginning of the file at identical offsets.

 

4th column: Offset

 

The relative offset within a file at which the signature occurs. Often simply 0. The signature must be contained in the first 512 bytes.

 

5th column: Footer

 

Optional. A signature (byte sequence) that reliably indicates the end of a file, specified in GREP syntax. GREP expressions that represent variably-sized data may not work as expected. A footer signature may help to achieve a recovery with the correct file size. The recovery algorithm does not search for the footer further than the number of bytes specified as the maximum file size, starting from the header.

 

Even better than a footer is the potential availability of an internally implemented algorithm in X-Ways Forensics that knows the file format well and can usually find out the correct file size if a file is not fragmented, incomplete or corrupt. Such an algorithm is indicated in the Footer column with a tilde (~) and an algorithm ID number.

 

6th column: Default size

 

Optional. 1 or 2 values. If 2 values, the second one is a file type specific size detection limit and delimited from the default size by a forward slash. For an explanation see here.

 

7th column: Flags

 

Optional. Can further tailor file carving for certain file types and are yet another indicator of how sophisticated and powerful file carving is in X-Ways Forensics.

 

b (lower case): The signature is searched at the byte level when given the choice. Useful especially for entries/record/micro-formats/memory artifacts (i.e. not complete ordinary files) that are not typically aligned at any sector or cluster boundaries.

 

B (upper case): Prevents a byte-level search for that particular signature, for performance reasons.

 

c (lower case): If taken into account (depends on user interface settings), ignores header signatures that are not aligned at cluster boundaries. Can be useful for some file types to avoid to many false positives.

 

C (upper case): Denotes file type signatures that should not be used to search for NTFS-compressed files if compensation for NTFS compression is active, because they are too weak and would yield too many false positives or would not be actually stored as compressed anyway.

 

d (lower case, for "direct"): The signature will be interpreted literally, not as as a GREP expression, character by character, with byte values according to the active code page in your Windows system. Useful for example if you are not very familiar with GREP notation or don't need GREP and just want to get all characters interpreted literally according to the code page that is active in your Windows system, without thinking much about whether the characters are considered special characters in GREP. For example, <?xml version="1 is a valid signature for certain XML files, but it works only with the direct flag because the question mark has a special meaning in GREP, which results in a different byte value sequence for the signature internally if the entire expression is interpreted as GREP, and would not yield any matches if GREP interpretation is active.

 

e: Stands for "embedded". If a file type has a tilde (~) algorithm in the Footer column and is marked with this flag, it will be preselected for a search of embedded data in certain other files during volume snapshot refinements, in the "File header signature search in all files not processed above" section. The "e" flag merely helps to initialize the tick marks for this option. Ultimately the user can change the selected file types for that operation in the user interface. Also, the types marked with the "e" flag will be searched embedded in files of types for which no internal extraction algorithm exists.

 

E: Never carved as an embedded file within other files.

f (lower case): Indicates that the specified footer signature is used to find data that is not part of the file any more and should excluded. Ordinary footers are included in the carved file. Useful for file formats that do not have a well defined footer, where the end of the file can be detected by the occurrence of data that does not belong to the file any more. That could be the same signature as the header (if files of that type occur typically in groups, back to back) or just \x00 (for file formats such as text files that do not contain zero-value bytes, where however \x00 can be expected with a high likelihood in the RAM slack). Such footer signatures should be marked as exclusive because the data matched by it is not part of the file itself.

 

F (upper case): Makes X-Ways Forensics discard hits of the file header signature search if no corresponding footer can be found, provided that a footer signature is specified in the definition. Can be useful to reduce the number of or totally avoid false positives.

 

G: Stands for "greedy". Greedily allocates all the sectors exclusively. The file type signature search continues its search for further file headers only after the presumed end of such files. Can be useful if an internally implemented algorithm is available that is certain that the carved file contains all valid data, so that it is not necessary to search for other files within the previously carved file's boundaries. The flag has an effect only if the file header signature is found at a sector boundary. If a file in free space is carved around allocated clusters, only the first fragment of the file is skipped when searching for further file header signatures.

 

g (lower case): Weaker version of the same flag. Only if an internal file size detection algorithm exists for a file type and if a file with the same start sector number exists already with the same file size as detected, the "g" flag will cause X-Ways Forensics to skip the affected sectors. This can help to prevent overlapping zip files and thereby avoid potentially many contained duplicate files. Has no effect when combined with b.

 

h: Indicates that the specified header signature is used to find data that is not part of the file itself. That means that the header will be excluded from the carved file. The carved file will start after the header. Additionally, this flag prevents file carving in free space around allocated clusters for files of this type.

 

L: Identifies links that merely link to other definitions. Useful for example to have an entry for OpenOffice files, which was missed by some users and whose absence could lead to the misconception that it is not possible to carve OpenOffice files. If the entry for OpenOffice is selected for carving, this internally automatically selects zip archives for carving, which makes sense because OpenOffice files technically are zip files and can be carved as such. The disadvantage is just that other zip archives that are not OpenOffice files are also carved. However, those files will be distinguishable thanks on the internal file type detection, for example based on the automatically assigned filename extension.

 

S: Marks signatures that are good enough for the file header signature search (probably in conjunction with a carving algorithm), but not for file type verification because of occasional misidentifications. This flag should be very rarely needed.

 

t: Prevents X-Ways Forensics from presenting the type of carved files immediately as confirmed. Useful for example for file format families such as XML, to determine the exact subtype later during file type verification.

 

u (lower case): Stands for "unused". Allows to carve files only in clusters that are free according to the file system.

 

U (upper case): Allows to carve files only in clusters that are free according to the file system and also not used by previously existing files as contained in the volume snapshot.

 

W (upper case): Identifies header signatures that are too weak to newly detect the type of a file and are merely used to confirm the type suggested by the name extension of the file.

 

x: Identifies file types for which it is relatively normal that the actual filename extension is not the standard extension for that file type, so that files of these types will not be highlighted as "mismatch detected" after file type verification, but just presented as "newly identified", as to not draw more attention to these files than they deserve.

 

y: Identifies file types that are known to use encryption internally, which allows to mark carved files of these types in the Attr. column immediately with "e!".