PhotoDNA
X-Ways Forensics applies the PhotoDNA hashing algorithm to photos, until further notice. Thanks to the robustness of the hash algorithm and its specialization in photos, it usually allows to automatically recognize known photos even if they have experienced lossy compression repeatedly (e.g. JPEG), if they have been stored in a different file format, resized, partially blurred/pixelated, color-adjusted or contrast-adjusted etc. Unlike hash values computed by conventional general purpose algorithms, PhotoDNA hashes are resistant to various such image alterations or change only slightly. Optionally, known photos can be recognized even if they were mirrored (flipped horizontally). To avoid loss of time with small irrelevant pictures, PhotoDNA is not applied to pictures that are less than 50 pixels wide or tall.
For licensing reasons the PhotoDNA functionality is made available as a separate download, and provided by X-Ways itself only to law enforcement agencies, which may use it to prevent the spread of child sexual abuse content and for investigations targeted to stop its distribution and possession. For details about PhotoDNA please see this high level technical explanation and this press information.
If the PhotoDNA functionality is present, a database with PhotoDNA hash values of photos can be created and maintained within X-Ways Forensics, and photos may be matched against that hash database in X-Ways Forensics and X-Ways Investigator to automatically identify known incriminating content.
Law enforcement agencies may want to create and share their own collections of such hash values, based on pictures from previous cases, or import an extensive existing collection from Project Vic (JSON/ODATA format layout version 1.0, from v18.1 of X-Ways Forensics also version 1.1, from v18.2 of X-Ways Forensics also version 1.2). You can also import PhotoDNA hash databases of other X-Ways users (select the "RHDB" file!), you may delete hash categories that you don't need any more, and you may merge or rename categories in your database. When importing someone else's hash database, their categories of the same name will be merged with yours. PhotoDNA hash values may also be imported if they are stored in text files, with "PhotoDNA" in the first line, followed by 1 hash value per line in hex ASCII or Base64.
Hash values of pictures in the volume snapshot of an evidence object can be added to the PhotoDNA hash database in the same way as conventional hash sets are added to a conventional hash database, using the Include in Hash Database command in the directory browser context menu. The database is one of the several databases that can be managed with the Tools | Hash Database command. The PhotoDNA hash database is stored in a directory next to hash database #1.
When importing PhotoDNA hash collections or when including the PhotoDNA hash values of selected files into the database directly in X-Ways Forensics, the additional entries are checked for redundancies and conflicting categorizations among each other and with existing entries in the database, to keep the database as small, fast and useful as possible. This is recommended, but optional, and if you skip this step and if the data set is very large, you potentially save hours of time, at the cost that matching pictures against the database during volume snapshot refinement will take more time, and that for variations of the same picture you may get different classifications returned. You may define the import strictness separately to define how similar hash values have to be to warrant a re-classification of existing values (to keep the database consistent) and to define how similar hash values have to be to overwrite (replace) an existing value with a new value (to keep the database compact and less redundant). The latter strictness must not be less than the former. A hash value can be either an existing, old value in the database, a new hash value in the database added by the current import operation, or a pending hash value that is yet to be added to the database.
1) If a pending hash Y is absolutely identical to an old or new hash X, Y will be ignored and not added to the database. If Y and X are just similar, Y will be added. If Y and X are almost identical, X is directly replaced (overwritten) with Y.
2) If Y and X are identical or similar and, but belong to different categories, and X is new, that means that the quality of the import file is low. You will see a warning. If the import is from a ProjectVic hash collection, and the two categories are the relatively similar categories "child abuse" and "child exploitation", no special action is taken. If the two categories involved are not those two: If either X or Y belongs to the category "non-pertinent" and the picture is a largely monochromatic picture, X will be assigned to the category "non-pertinent". Otherwise the categorization conflict will be resolved by assigning X to the category "uncategorized".
3) If Y and X are identical or similar, but belong to different categories, and X is old, X will be assigned to the same category as Y, assuming that the previous categorization is wrong or outdated and the import file contains correct/new information. This is beneficial for example for entries whose original categorization is from a foreign source (e.g. Project Vic) and which needs to be adjusted because of different legislation or jurisdiction in your country or simply because of categorization errors or different interpretations. What is considered child pornography in one country is not necessarily classified as such in another country (example: computer generation images, animation). Recategorization requires that you have copies of the same pictures (not necessarily the exact same files) in your collection or know which hash values belongs to which picture exactly.
When adding PhotoDNA hash values to the internal PhotoDNA hash database with the Include in Hash Database command, you have the option to store your comments about the selected files in that hash database as descriptions. These descriptions can be automatically adopted as comments again next time when the same pictures are found in another case. They can either replace existing comments in the other case or (if the corresponding check box is half checked) be appended to existing comments. This is very useful for example for police investigators who are required by the court to provide a textual description of each and every child pornography picture, to at least spare them the work of entering descriptions of the same known pictures more than once. Also useful to store information such as known identities of the persons in the photo, previous case numbers etc., for future reference if the same photos are found elsewhere. The descriptions in the hash database can be updated with your comments by simply adding the PhotoDNA hash values of the same files to the internal database again through the Include in Hash Database command. When you import a colleague's internal hash database (by selecting their RHDB file), be sure to have not only the corresponding RHCN file (with the category names) present in the same directory, but also the new subdirectories that contain the descriptions, if any, if you wish to import these descriptions.
To delete all internal descriptions, you can simply delete the D* subdirectories of the PhotoDNA hash database directory. Or if you wish to share your database with other users without the descriptions, simply do not include the D* subdirectories. You may also manually delete or update any individual descriptions in the text files in the D* subdirectories at any time. Descriptions that you already have in your database will not get lost if you import hash values of the same pictures again from other sources, except they will be overwritten if that other source is a PhotoDNA hash database of X-Ways Forensics that has descriptions of the same pictures.
When creating a PhotoDNA hash set of selected pictures, you may choose to not add the hash set into the internal database, but create a separate plain text file with PhotoDNA hash values instead. For that, please check the "Save as..." box. Such files can be passed on to other users if they wish to add the specified hash values to their databases or remove them (see above).
It is possible to cleanse a PhotoDNA hash database from unwanted hash values. The hash values to remove are provided as a plain text file, with 1 hash value in hex ASCII notation per line and "PhotoDNA" in the first line. The specified hash values match exact equivalents contained in the hash database and also small variations (same deviation permitted as set for matching). It may become necessary to cleanse a PhotoDNA hash database if you have imported hash sets from a foreign source whose contents partially do not meet your requirements, which becomes apparent when you get false hits, if you do not wish to remove the entire hash set, or if you have accidentally included a wrong picture in your hash database yourself.
There is a button that allows to export selected hash collections into text files to share them with other users or to check which hash values are contained/which ones were deduplicated etc. Another function (the button with the magnifying glass) will help you to check the database for the presence of a specific hash value, specified in Hex ASCII or Base64 notation. If there is a hit, you will be shown the name of the hash collection that contains the hash value. If the matching entry in the database has a textual description, that description will be shown as well. Up to 19 matches are returned, and for each you will see how precise the match is (the higher, the more precise; same basic scale as the user-specified strictness for matching, i.e. level 1 means very rough match). You have the option to narrow down the result list to more precise matches by enforcing a higher minimum strictness level, which is useful if there are more matches than can be listed.
There is a function to mark selected PhotoDNA categories as "preferred", with a black star. That way they will get priority if for a picture in the volume snapshot matches are found with hash values in different categories. Such preferred categories will be reported as a match even if alternative matches with non-preferred categories are much closer matches. That is useful for example if you have categories in your database that you trust to be accurate and suitable and others that you trust less, for example because they are known to contain errors (e.g. the same picture classified as CP and non-pertinent at the same time) and/or because they are from a foreign source and based on different laws and jurisdiction.
Matching is part of the "picture analysis and processing" operation in Specialist | Refine Volume Snapshot. If there matches for the same picture in different categories of the PhotoDNA hash database, you can see that in the directory browser: The name of the category with the closest match is shown, followed by a comma and an ellipsis. In rare cases where this happens it can be important to review the picture manually and make the final decision about its relevance for the case. You can also filter for pictures that were found in more than one category. Such pictures may deserve as much attention as duplicates in conventional hash databases that belong to the "irrelevant" category and "notable" category at the same time and are usually the result of an inconsistently populated database, e.g. accidental miscategorizations or correct categorizations made by users in different jurisdictions etc. If the returned best matching category for a picture is wrong in your opinion, you can fix this by adding the PhotoDNA hash value of that picture to the PhotoDNA database again, specifying the correct category.