Topic66

WinHex & X-Ways

File Recovery by Type/File Header Signature Search

 

Data recovery function in the Disk Tools menu, and also a strategy to find previously existing files as part of the Refine Volume Snapshot command. This recovery method is also referred to as "file carving". It searches for files that can be recognized by a characteristic file header signature (a certain sequence of byte values). Because of this approach, file carving does not depend on the existence of functional file system structures.

 

File Recovery by Type: Files found based on a file header signature are carved and stored in the output folder that you specify on one of your own drives. Optionally, recovered files of each type are put into their own subfolder (...\JPEG, ...\HTML, etc.). The presumed contents of the files are actually copied.

File header signature search: Files found based on a file header signature are not stored anywhere, but merely listed in a dedicated virtual directory of the volume snapshot. Only a reference to the file is stored (artificially generated name, presumed size, start offset, ...). The file contents are read from the original disk/image on the fly when needed to view/copy the file. Optionally, you can output files from separate file header signature search operations into separate subdirectories, so that it's easier to distinguish between them if needed.

 

Note that file carving generally assumes contiguous file clusters, so it produces corrupt files in case the files were originally stored in a fragmented way. The following exception exists: If the file header signature search in volumes with a supported file system other than Ext2/Ext3 finds the start of a file in free space, at a cluster boundary, the data is by default assumed to flow around potentially following clusters that are marked by the file system as in use. This will correctly reconstruct files that were created after and stored around other files and then deleted, as long as the released clusters were not re-used and overwritten afterwards. To prevent file carving purely in free space this way, i.e. assume contiguous clusters, you can unselect the option "Carve files in free clusters around used clusters".

 

The option "Ext2/Ext3 block logic" causes this recovery method to deviate from the standard assumption of no fragmention as well, in that it will follow the typical Ext block pattern, where e.g. the 13th block from the header of the file is considered an indirect block that references the following data blocks. This option has no effect when applied to partitions that WinHex knows have a file system other than Ext2 and Ext3 or when a header is found that is not block-aligned.

 

A log file "File Recovery by Type.log" about the selected parameters and the recovery results is written to the output folder for verification purposes.

 

You can expand or collapse the entire file type tree in this dialog window with a single mouse click on the appropriate button. That is useful because when expanded you only need to type the first few characters of the file type description to automatically jump to the first matching item in the tree.

 

Since no use is made of a possible presence of a (consistent or damaged) file system, the original file sizes are principally unknown to this recovery method, and so are the original filenames. That is why the resulting files are mostly named generically according to the following pattern: Prefix#####.ext. "Prefix" is an optional prefix you provide. #####" is an incrementing number per evidence object. "ext" is the filename extension that corresponds to the file header signature according to the file type definition. The output filename prefix may optionally contain a placeholder "%d", which will be replaced by the drive name. This is useful if you apply File Recovery by Type to multiple drives at a time and wish to be able to easily distinguish files from different drives.

 

With a specialist license or higher, the "intelligent naming" option will cause Exif JPEG files to be automatically named after the digital camera model that created them and their internal time stamp, if available. Many Windows Registry hive files are given their original names, also some JPEG files in whose metadata Photoshop has embedded a name. JPEG files without known name and no Exif metadata that however have been created by a known library receive some additional information in their artificial names in parentheses (see generator signature). Thumbs.db files are always named thumbs.db, index.dat always index.dat. The aforementioned prefix is not used in conjunction with original filenames.

 

Various algorithms are at work internally that try to determine the original sizes of files of many different types (among others, JPEG, GIF, PNG, BMP, TIFF, Nikon NEF, Canon CR2 raw, PSD, CDR, AVI, WAV, MOV, MPEG, MP3, MP4, 3GP, M4V, M4A, ASF, WMV, WMA, ZIP, GZIP, RAR, 7Z, TAR, MS Word, MS Excel, MS PowerPoint, RTF, PDF, HTML, XML, XSD, DTD, PST, DBX,  AOL PFC, Windows Registry, index.dat, Prefetch, SPL, EVTX, EML) by examining their data structure. This applies to entries in the file type definition database that have a "~" in the Footer column. These entries should not be altered in order for the size and type detection to work for these file types. Alternatively, a footer signature can also help to find the end of a file. Files for which neither an internal algorithm nor a footer signature definition exists or file about whose original size the available internal algorithm has no idea and for which no footer signature is actually found, are recovered at the default size specified in the file type definition database in bytes. Be generous when specifying such a size because whereas files recovered "too large" can still be opened by their associated applications, prematurely truncated files often can't be as they are incomplete. The attempt to detect the original size of files of certain types by searching for a footer is limited by a size detection limit, which is optionally specified in the database as well, after the default size and a forward slash. Such a limit is necessary to avoid that a footer for a given file is searched within the whole volume, which would be very time-consuming if the volume is large. Also, it becomes increasingly unlikely to find the right footer if not in the immediate vicinity of the header, and even if found very far apart, such a file is likely fragmented or partially overwritten etc. The standard default size (if not specified) is 1 MB. The standard maximum size (if not specified) is 64 times the default file size.

 

File headers are usually found at cluster boundaries because that is where file systems mostly put the start of a file. However, it is more thorough (and not slower) to search for sector-aligned file headers because that allows to also find files from previously existing partitions with a different cluster layout, so searching at sector boundaries is the default behavior. If performed on a physical medium or raw file with no cluster layout defined, WinHex has to search at sector boundaries anyway. There is yet another possibility, a thorough byte-level search. This is required when you are trying to find files that are not reliably aligned at any sector boundaries (e.g. files in backup files or tape images or embedded in other files) or when trying to find entries/records/micro-formats/memory artifacts etc., i.e. not complete ordinary files. This comes at the cost of a possibly increased number of false positives, though, misidentified file signatures occurring randomly on a media, not indicating the beginning of a file. Individual flags in the file type definition database can help on a per file type basis to decide which files to search for a cluster, sector or byte boundaries.

 

That the start sectors of files that are already known to the volume snapshot are always excluded from file carving is optional. Of course, X-Ways Forensics generally still tries to prevent duplicates, but if the file header signature definition or the internal file size detection is strong enough to suggest that a known deleted file was overwritten with a new file, then that new file will be carved although it shares the same start sector with the known file.

 

If you intentionally abort the file header signature search or if the file header signature search causes X-Ways Forensics to crash, next time when you start a file header signature search in the same evidence object, you will find an option to resume it right where it was interrupted, or where it was when the volume snapshot was last saved before the crash occurred (depends on the auto-save interval of the case).

 

You may limit the scope of the recovery to a currently selected block if necessary and/or to allocated or unallocated space (option available on a logical drive or volume). E.g. in order to recover files that were deleted, you select to recover from unallocated space only. Files that are not accessible any more because of file system errors may still be stored in clusters that are considered as in use.

 

The effects of NTFS compression on file data can optionally be compensated for in a file header signature search (forensic license only), in many cases successfully. If the signature of an NTFS-compressed file is found, the file will be marked as compressed, and an attempt will be made to decompress the file “on the fly” when needed with a sophisticated algorithm that can even decompress files that consist of multiple compression units.