Enforcing Character Encoding with DOM

MSXML 5.0 SDK

Microsoft XML Core Services (MSXML) 5.0 for Microsoft Office - DOM Developer's Guide

Enforcing Character Encoding with DOM

In some cases, an XML document is passed to and processed by an application—for example, an ASP page—that cannot properly decode rare or new characters. When this happens, you might be able to work around the problem by relying on DOM to handle the character encoding. This bypasses the incapable application.

For example, the following XML document contains the character entity ("€") that corresponds to the Euro currency symbol (). The ASP page, incapable.asp, cannot process currency.xml.

XML Data (currency.xml)

<?xml version="1.0" encoding="utf-8"?>
<currency>
   <name>Euro</name>
   <symbol>&#8364;</symbol>
   <exchange>
      <base>US$</base>
      <rate>1.106</rate>
   </exchange>
</currency>

ASP Page (incapable.asp)

<%@language = "javascript"%>
<%
   var doc = new ActiveXObject("Msxml2.DOMDocument.5.0");
   doc.async = false;
   if (doc.load(Server.MapPath("currency.xml"))==true) {
      Response.ContentType = "text/xml";
      Response.Write(doc.xml);
   }
%>

When incapable.asp is opened from a Web browser, an error such as the following results:

An invalid character was found in text content. Error processing resource 'http://MyWebServer/MyVirtualDirectory/incapable.asp'. Line 4, Position 10

This error is caused by the use of the Response.Write(doc.xml) instruction in the incapable.asp code. Because it calls upon ASP to encode/decode the Euro currency symbol character found in currency.xml, it fails. However, you can fix this error. To do so, replace the Response.Write(doc.xml) instruction in incapable.asp with the following line:

doc.save(Response); 

With this line, the error does not occur. The ASP code produces the correct output in a Web browser, as follows:

  <?xml version="1.0" encoding="utf-8" ?>
  <currency>
    <name>Euro</name>
    <symbol></symbol>
    <exchange>
      <base>US$</base>
      <rate>1.106</rate>
    </exchange>
  </currency>

The effect of the change in the ASP page is to let the DOM object (doc)—instead of the Response object on the ASP page—handle the character encoding.

See Also

Character Encoding, XML, and MSXML