Character Set Recognition

MS Office DHTML, HTML & CSS

Character Set Recognition


Microsoft® Internet Explorer uses the character set specified for a document to determine how to translate the bytes in the document into characters on the screen or on paper. By default, Internet Explorer uses the character set specified in the HTTP content type returned by the server to determine this translation. If this parameter is not given, Internet Explorer uses the character set specified by the META element in the document. It uses the user's preferences if no META element is given.

You can use the META element to explicitly set the character set for a document. In this case, you set the HTTP-EQUIV= attribute to "Content-Type" and specify a character set identifier in the CONTENT= attribute. For example, the following META element identifies windows-1251 as the character set for the document.

<META HTTP-EQUIV="Content-Type"
  CONTENT="text/html; CHARSET=windows-1251">

As long as you place the META element before the BODY element, it affects the whole document, including the TITLE element. For clarity, it should appear as the first element after HEAD so that all readers know the encoding before the first element that can be displayed is parsed. Note that the META element applies to the document containing it. This means, for example, that a compound document (a document consisting of two or more documents in a set of frames) can use different character sets in different frames.

The following table contains information concerning the character sets supported by Internet Explorer 5. The information provided is:

  1. Display Name — the name used to refer to the character set.
  2. Preferred Charset ID — the most common identifier used to set character sets in Internet Explorer. For example, in the previous code sample windows-1251 is the Charset ID.
  3. Additional Aliases — other identifiers that may be used to set character sets.
  4. MLang Code Pages — numeric value of the code pages used by the Internet Explorer MLang API.
  5. Supported by Version — the versions of Internet Explorer that support the listed character sets.

    Note CS indicates that the version of Internet Explorer must support complex scripts such as Arabic, Hebrew, or Thai.

Charsets in Microsoft Internet Explorer 5

Display NamePreferred Charset IDAdditional Aliases MLang Code PageSupported by Versions
Arabic ASMO-708 ASMO-708 708 4CS, 5
Arabic (DOS) DOS-720 720 4CS, 5
Arabic (ISO) iso-8859-6 ISO_8859-6:1987, iso-ir-127, ISO_8859-6, ECMA-114, arabic, csISOLatinArabic 28596 4CS, 5
Arabic (Windows) windows-1256 1256 4CS, 5
Baltic (ISO) iso-8859-4 csISOLatin4, iso-ir-110, ISO_8859-4, ISO_8859-4:1988, l4, latin4 28594 4, 5
Baltic (Windows) Windows-1257 1257 4, 5
Central European (DOS) ibm852 cp852 852 4, 5
Central European (ISO) iso-8859-2 csISOLatin2, iso-ir-101, iso8859-2, iso_8859-2, iso_8859-2:1987, l2, latin2 28592 3, 4, 5
Central European (Windows) windows-1250 x-cp1250 1250 3, 4, 5
Chinese Simplified (GB2312) gb2312 chinese, csGB2312, csISO58GB23128, GB2312, GBK, GB_2312-80, iso-ir-58 936 3, 4, 5
Chinese Simplified (HZ) hz-gb-2312 52936 4, 5
Chinese Traditional big5 csbig5, x-x-big5 950 3, 4, 5
Cyrillic (DOS) cp866 ibm866 866 4, 5
Cyrillic (ISO) iso-8859-5 csISOLatinCyrillic, cyrillic, iso-ir-144, ISO_8859-5, ISO_8859-5:1988 28595 4, 5
Cyrillic (KOI8-R) koi8-r csKOI8R, koi 20866 3, 4, 5
Cyrillic (Windows) windows-1251 x-cp1251 1251 3, 4, 5
Greek (ISO) iso-8859-7 csISOLatinGreek, ECMA-118, ELOT_928, greek, greek8, iso-ir-126, ISO_8859-7, ISO_8859-7:1987 28597 3, 4, 5
Greek (Windows) Windows-1253 windows-1253 1253 5
Hebrew (DOS) DOS-862 862 4CS, 5
Hebrew (ISO) iso-8859-8 csISOLatinHebrew, hebrew, iso-ir-138, ISO_8859-8, visual, ISO-8859-8 Visual 28598 4CS, 5
Hebrew (Windows) windows-1255 logical, ISO_8859-8:1988, iso-ir-138 1255 3CS, 4CS, 5
Japanese (JIS) iso-2022-jp csISO2022JP 50220 4, 5
Japanese (JIS-Allow 1-byte Kana) csISO2022JP iso-2022-jp 50221 4, 5
Japanese (JIS-Allow 1-byte Kana - SO/SI) iso-2022-jp csISO2022JP 50222 3, 4, 5
Japanese (EUC) euc-jp csEUCPkdFmtJapanese, Extended_UNIX_Code_Packed_
Format_for_Japanese, x-euc, x-euc-jp
51932 3, 4, 5
Japanese (Shift-JIS) shift_jis csShiftJIS, csWindows31J, ms_Kanji, shift-jis, x-ms-cp932, x-sjis 932 3, 4, 5
Korean ks_c_5601-1987 csKSC56011987, euc-kr, korean, ks_c_5601 949 3, 4, 5
Korean (ISO) iso-2022-kr csISO2022KR 50225 3, 4, 5
Latin 3 (ISO) iso-8859-3 28593 4, 5
Thai (Windows) iso-8859-11 windows-874 874 3, 4, 5
Turkish (Windows) Windows-1254 windows-1254 1254 3, 4, 5
Turkish (ISO) iso-8859-9 csISOLatin5, ISO_8859-9, ISO_8859-9:1989, iso-ir-148, l5, latin5 28599 3, 4, 5
Ukrainian (KOI8-U) koi8-u 21866 4, 5
Unicode (UTF-7) utf-7 csUnicode11UTF7, unicode-1-1-utf-7, x-unicode-2-0-utf-7 65000 4, 5
Unicode (UFT-8) utf-8 unicode-1-1-utf-8, unicode-2-0-utf-8, x-unicode-2-0-utf-8 65001 4, 5
Vietnamese (Windows) windows-1258 1258 3, 4, 5
Western European (Windows) Windows-1252 1252 5
Western European (ISO) iso-8859-1 ANSI_X3.4-1968, ANSI_X3.4-1986, ascii, cp367, cp819, csASCII, IBM367, ibm819, iso-ir-100, iso-ir-6, ISO646-US, iso8859-1, ISO_646.irv:1991, iso_8859-1, iso_8859-1:1987, latin1, us, us-ascii, x-ansi 1252 3, 4, 5

Nonstandard Charsets with Special Meaning Inside Internet Explorer and MLang

These character sets are not to be used for labeling documents.

Display NamePreferred Charset IDAdditional Aliases MLang Code PageSupported by Versions
Japanese (Auto Select) _autodetect 50932 3, 4, 5
Korean (Auto Select) _autodetect_kr 50949 4, 5
Unicode unicode 1200 4, 5
Unicode (BigEndian) unicodeFEFF 1201 4, 5
User Defined x-user-defined 50000 4, 5


Back to topBack to top

Did you find this topic useful? Suggestions for other topics? write us!Internet Link

© 1999 microsoft corporation. all rights reserved. terms of useInternet Link.