Generator Signatures

WinHex & X-Ways

Generator Signatures

 

The generator signature is a concept that identifies subtypes of common file types like JPEG and

PDF. Those subtypes can be associated with devices (scanners, cameras) or applications (e.g. Photoshop). For JPEG, the signature is based on the quantization table and some other invariant features that are shared by all JPEG files. The generator signature is provided with the metadata as a 32 bit raw hex number accompanied by a textual description derived from the file “Generator Signatures.txt”.

 

607AE169 (IJG Library 94 / Paint)

 

This example shows the signature that results from a JPEG file generated by Microsoft Paint. The number is the image quality in the range 1...100. 94 is the fixed image quality setting specific for Microsoft Paint.

 

JPEG signatures can be subdivided into three groups. The first group is named Standard (identical to IJG Library). Files in this group make use of the quantization tables as defined by the JPEG standard. There are exactly 99 quality grades. The second group is named Extended. Here a particular grade is subdivided into roughly 100 additional grades by interpolating the standard quantization tables. Those signatures usually belong to entry level camera models that act according to size-priority compression methods.

 

D3D8AD02 (Extended 95.10 / 10 MP camera)

 

The image quality is presented with two fractional digits within the metadata column as well as with the DQT-marker in the details pane. Whether a camera operates with the size-priority scheme can be judged by the Exif field CompressedBitsPerPixel.

 

The third group is called Custom. Files in this group make use of proprietary quantization tables that are specific to certain devices or applications. Here too the image quality is shown in the range 0…100 with two fractional digits. Exceptions are Photoshop with 13 grades in the range 0...12, Apple Quicktime with grades in the range 1…1024, and LEAD Technologies with the range 2…255.

 

53631B67 (LEAD Technologies 2 / Scan)

 

The second part of the description, Scan, can also have the values Facebook, WhatsApp or MsPhoto. MsPhoto means that this file has been edited by Micosoft Photo Gallery.

 

Generator signatures form the basis of the calculation of the generic relevance. In addition, Generator signatures are used in X-Ways Forensics during the file header signature search to name carved JPEG files if no “better” metadata is available (e.g. camera model and timestamp from the Exif data). If the metadata extraction cannot find any “better” metadata, the generator signature can still be output, and that signature at least allows you to identify groups of files that likely have the same origin. Verifying whether the generator signature and available Exif metadata are consistent with each other may tell you whether a picture was edited and saved again.

 

In particular the generator signature allows to identify files that were produced by scanners, as there are only a handful of generators commonly used in scanners. That allows to reliably identify scanned images even if they are not black and white or not 100% using gray scale colors only. PDF files produced by scanners can also be identified by generator signatures. Such files are associated with the report table “Scan”.

 

PDF generator signatures are available even if there are no metadata or no metadata could be extracted. With 4,700 signatures (as of v19.0), more than 99% of all PDF files are covered.  One particularly notable PDF generator signature category in the file „Generator Signatures.txt“ is “Reporting/Records”, which identifies documents like bank account statements and invoices. This identification also improves the automatic relevance judgment.

 

The file "Generator Signatures.txt" is similar to the other text files that ship with X-Ways Forensics and like those can be edited it to adjust the relevance estimation that is part of metadata extraction. If for example knowing that a JPEG file was generated by a scanner is important for you (because you are a tax fraud or other white collar crime investigator interested in scanned documents), you would make sure that the “JPEG/Scan” group has a high weight (e.g. 9). That's the number after the tab in the line with the *** group definition. If such a file is of less importance to you (e.g. because the pictures that you have to look for are CP photos), then you reduce the weight of that group (setting it e.g. to 1). You can also edit the individual relevance of each generator in a group. A weight of a particular signature has to be in the range 0…9, default being 5. There is no such range restriction for the weight of a group.

 

The model designations of known scanning devices can be manually extended in the section "KnownScanner" of "Generator Signatures.txt". Identification by model name can help to identify scanned images if they contain Exif data or were edited. Generally the detection as scanned images is based on 1) generator signature, 2) generic properties of the Exif metadata (FileSource, Density, ...) and 3) the KnownScanner list.

 

The prefix "Reporting::" in generator signature definitions allows for easier filtering for the category reporting/records.