ProvisionalAlternateEncoding Property

DotNetZip

previous page next page

Ionic Zip Library v1.9.1.6 ProvisionalAlternateEncoding Property

The text encoding to use when writing new entries to the ZipFile, for those entries that cannot be encoded with the default (IBM437) encoding; or, the text encoding that was used when reading the entries from the ZipFile.

Declaration Syntax

Visual Basic

Visual C++

[ObsoleteAttribute("use AlternateEncoding instead.")]
public Encoding ProvisionalAlternateEncoding { get; set; }

<ObsoleteAttribute("use AlternateEncoding instead.")> _
Public Property ProvisionalAlternateEncoding As Encoding
	Get
	Set

[ObsoleteAttribute(L"use AlternateEncoding instead.")]
public:
property Encoding^ ProvisionalAlternateEncoding {
	Encoding^ get ();
	void set (Encoding^ value);
}

Remarks

In its zip specification, PKWare describes two options for encoding filenames and comments: using IBM437 or UTF-8. But, some archiving tools or libraries do not follow the specification, and instead encode characters using the system default code page. For example, WinRAR when run on a machine in Shanghai may encode filenames with the Big-5 Chinese (950) code page. This behavior is contrary to the Zip specification, but it occurs anyway.

When using DotNetZip to write zip archives that will be read by one of these other archivers, set this property to specify the code page to use when encoding the FileName and Comment for each ZipEntry in the zip file, for values that cannot be encoded with the default codepage for zip files, IBM437. This is why this property is "provisional". In all cases, IBM437 is used where possible, in other words, where no loss of data would result. It is possible, therefore, to have a given entry with a Comment encoded in IBM437 and a FileName encoded with the specified "provisional" codepage.

Be aware that a zip file created after you've explicitly set the ProvisionalAlternateEncoding property to a value other than IBM437 may not be compliant to the PKWare specification, and may not be readable by compliant archivers. On the other hand, many (most?) archivers are non-compliant and can read zip files created in arbitrary code pages. The trick is to use or specify the proper codepage when reading the zip.

When creating a zip archive using this library, it is possible to change the value of ProvisionalAlternateEncoding between each entry you add, and between adding entries and the call to Save(). Don't do this. It will likely result in a zipfile that is not readable. For best interoperability, either leave ProvisionalAlternateEncoding alone, or specify it only once, before adding any entries to the ZipFile instance. There is one exception to this recommendation, described later.

When using an arbitrary, non-UTF8 code page for encoding, there is no standard way for the creator application - whether DotNetZip, WinZip, WinRar, or something else - to formally specify in the zip file which codepage has been used for the entries. As a result, readers of zip files are not able to inspect the zip file and determine the codepage that was used for the entries contained within it. It is left to the application or user to determine the necessary codepage when reading zip files encoded this way. In other words, if you explicitly specify the codepage when you create the zipfile, you must explicitly specify the same codepage when reading the zipfile.

The way you specify the code page to use when reading a zip file varies depending on the tool or library you use to read the zip. In DotNetZip, you use a ZipFile.Read() method that accepts an encoding parameter. It isn't possible with Windows Explorer, as far as I know, to specify an explicit codepage to use when reading a zip. If you use an incorrect codepage when reading a zipfile, you will get entries with filenames that are incorrect, and the incorrect filenames may even contain characters that are not legal for use within filenames in Windows. Extracting entries with illegal characters in the filenames will lead to exceptions. It's too bad, but this is just the way things are with code pages in zip files. Caveat Emptor.

Example: Suppose you create a zipfile that contains entries with filenames that have Danish characters. If you use ProvisionalAlternateEncoding equal to "iso-8859-1" (cp 28591), the filenames will be correctly encoded in the zip. But, to read that zipfile correctly, you have to specify the same codepage at the time you read it. If try to read that zip file with Windows Explorer or another application that is not flexible with respect to the codepage used to decode filenames in zipfiles, you will get a filename like "Inf°.txt".

When using DotNetZip to read a zip archive, and the zip archive uses an arbitrary code page, you must specify the encoding to use before or when the Zipfile is READ. This means you must use a ZipFile.Read() method that allows you to specify a System.Text.Encoding parameter. Setting the ProvisionalAlternateEncoding property after your application has read in the zip archive will not affect the entry names of entries that have already been read in.

And now, the exception to the rule described above. One strategy for specifying the code page for a given zip file is to describe the code page in a human-readable form in the Zip comment. For example, the comment may read "Entries in this archive are encoded in the Big5 code page". For maximum interoperability, the zip comment in this case should be encoded in the default, IBM437 code page. In this case, the zip comment is encoded using a different page than the filenames. To do this, Specify ProvisionalAlternateEncoding to your desired region-specific code page, once before adding any entries, and then reset ProvisionalAlternateEncoding to IBM437 before setting the Comment property and calling Save().

Examples

This example shows how to read a zip file using the Big-5 Chinese code page (950), and extract each entry in the zip file. For this code to work as desired, the Zipfile must have been created using the big5 code page (CP950). This is typical, for example, when using WinRar on a machine with CP950 set as the default code page. In that case, the names of entries within the Zip archive will be stored in that code page, and reading the zip archive must be done using that code page. If the application did not use the correct code page in ZipFile.Read(), then names of entries within the zip archive would not be correctly retrieved.

CopyC#

using (var zip = ZipFile.Read(zipFileName, System.Text.Encoding.GetEncoding("big5")))
{
    // retrieve and extract an entry using a name encoded with CP950
    zip[MyDesiredEntry].Extract("unpack");
}

CopyVB.NET

Using zip As ZipFile = ZipFile.Read(ZipToExtract, System.Text.Encoding.GetEncoding("big5"))
    ' retrieve and extract an entry using a name encoded with CP950
    zip(MyDesiredEntry).Extract("unpack")
End Using