Duplicate File Detection

WinHex & X-Ways

Duplicate File Detection

 

If you wish to review files with absolutely identical data only once and if filenames, timestamps, deletion status and other file system level metadata are of secondary importance, then you can use the command "Find duplicates in list" in the directory browser context menu to identify duplicate files, based on hash values (if computed) or other criteria. All the currently listed files are checked (listed, not selected!). If so desired, the duplicates can be automatically excluded in the volume snapshot. Only one file in each group of identical files will not be excluded. Each group of identical files can optionally be assigned to a unique report table, which makes it easy to use a filter to see all the members of a given group, even if they are contained in different evidence objects.

 

When in doubt which duplicate to exclude, this function chooses to keep existing (not deleted) files, and among deleted files rather discards carved files and keeps files found via file system data structures. And when in doubt, it prefers to keep the copy of a file whose owner is known. Optional special rules: Identical e-mail messages with different attachments (child objects) will be marked as duplicates, but not excluded. Identical attachments (child objects) will be marked as duplicates, but they will be excluded only indirectly if they are part of identical e-mail messages and those are excluded, too. This facilitates the examination and also avoids a situation where the parent (e-mail message) of one e-mail+attachment family and the child object (attachment) of another family is excluded.

 

If later you find relevant files for which there were duplicates and you are interested in the duplicates, too (wish to see their their filenames, paths, or timestamps etc.), you could for example create a hash set of that files to conveniently and automatically identify all the duplicates, by matching the hash values of all files against that particular hash set and using the hash set filter, or you could use the Hash column filter directly.

 

Pairs of duplicates in the same volume snapshot can be optionally linked as so-called related items, so that it's easy to navigate from one such file to at least one duplicate. However, that does not work across evidence object boundaries. Marking the files as duplicates in the Description column is optional.

 

Alternatively, you may exclude files simply based on identical names instead of identical hash values. This is a case-insensitive comparison and of course should be used only if you know what you are doing, as it does not compare the file contents at all. Could be useful for example if you wish to get rid of multiple copies of the same files found in backups if you do not need to keep different versions of these files. If prior to the comparison for example you sort by last modification date in descending order, this will ensure that the newest version of the file will be kept and all older versions will be excluded. Files with identical names are not marked as duplicates in the Attr. column.

 

If you have access to PhotoDNA in X-Ways Forensics, you may also identify and exclude duplicate pictures using PhotoDNA. All duplicates will be marked as "duplicates found" in the Attr. column, and all except one will be excluded. When in doubt, deleted files or pictures with a poor resolution will be excluded and existing files and pictures with a higher resolution will be kept. Please note that the hash value comparison is a potentially time-consuming operation if many pictures are listed in the directory browser, much more so than for conventional hash values. However, you can abort the comparison at any time. This operation requires that PhotoDNA hash values have been computed beforehand, using Specialist | Refine Volume Snapshot | Picture processing | Compute PhotoDNA hash values. It is useful for example for law enforcement agencies that wish create PhotoDNA hash sets of unique pictures only and for that purpose maintain a lawful collection of incriminating pictures without duplicates. The strictness of the picture comparison is the same as set in the Specialist | Refine Volume Snapshot | Picture processing dialog window for matching against the PhotoDNA hash database.