| |
|
|
ABCpdf fully supports HTML and CSS. You can render individual
pages of HTML using the AddImageUrl
method. You can page HTML over multiple PDF pages using the AddImageUrl
method in combination with the AddImageToChain
method. ABCpdf allows you to treat HTML like any other media so you can
even page your HTML across multiple columns of multiple pages of your PDF.
| | |
|
|
HTML was designed to specify the meaning of document content and
leave the precise rendering and layout up to the browser. PDF was designed to
specify the appearance of a document and ignore the meaning of the document content.
HTML and PDF are fundamentally different.
HTML is being changed to allow
greater control over the appearance of a document and PDF is being changed to
allow the meaning of a document to be better represented. However, the fact that
the two specifications are based on diametrically opposed concepts does mean that
it can be difficult to convert between the two.
ABCpdf
can use the MSHTML engine (used in Microsoft Internet Explorer) or
the Gecko engine (used in Mozilla Firefox) to parse and
preprocess the HTML for insertion into your PDF. This provides an
extremely accurate rendition of the HTML. Due to the differences in
behavior and capabilities of the underlying rendering engine, you should expect
differences in the rendered output when switching HTML
engine. Please refer to Engine,
ForMSHtml, and
ForGecko for further elaborations on the engines' distinct characteristics.
| | |
|
|
ABCpdf holds a cache of recently requested URLs and it's only after
five minutes or so that these pages expire from the cache.
This results
in a considerable degree of optimization for many common operations. However, if
you wish to bypass the cache, you can do so by setting the DisableCache parameter
to true when you call AddImageUrl or AddImageHtml. Occasionally, you may
find that your page is being cached elsewhere. There are all kinds of places this
can happen. For example, Windows sometimes caches individual page resources. Proxy
servers may cache entire pages.
The
standard reason that content gets cached is that pages are sending
HTTP header information which indicates that it is acceptable to cache
this content. If you are using the Internet Explorer HTML engine,
sometimes it will insist to cache certain Web pages. In that case, your
first step should be to use a tool like IEWatch to view the content
expiration headers. Indeed, you may find that simply adjusting the
content expiration settings found in the IIS Management Console will
resolve the issue. If
you want to be totally sure that your URLs are rendered afresh each time, you need
to vary the URL. For example: http://www.microsoft.com/?dummy=1 http://www.microsoft.com/?dummy=2
http://www.microsoft.com/?dummy=3 These will all render the same page (www.microsoft.com)
but because the URL is varying, you can be sure that they will be rendered afresh
each time.
| | | |
|
|
Obvious things will impact the speed of HTML conversion. So if you want to optimize the process look at retrieval times for your http requests, the size of your HTML and any related resources, the complexity of HTML, the speed of your computer. Tweaking these can make a big difference.
However there are also some small and simple things you may be able to do without getting into the complexity of system wide optimization.
The MSHTML rendering engine is, by default, set up for accuracy and quality. In order to ensure that the output is always good we have to enable every setting that might ever affect the output quality. This is the case even for situations in which you are not using those features.
So if your HTML does not contain features which require these settings then you can disable them. Doing so can result in significant speed improvements.
The setting which typically makes the biggest difference is HostWebBrowser but DoMarkup and AdjustLayout are also worth looking at. The actual speed increases depend very much on the input HTML, but in our tests, disabling these features for simple HTML, increased the speed of processing by about 30% for HostWebBrowser, another 10% for the DoMarkup property and another 7% for the AdjustLayout property.
Another property which needs examination is the UseScript one. By default this is set to false but many people enable it in their ABCpdf code. As long as your JavaScript is good and sensible then there is no problem. However JavaScript is often coded poorly and as such it may have an unpredictable effect on speed. Consider disabling this feature if you do not actively need it.
Setting the BrowserWidth to a predefined value means that ABCpdf does not have to compute one. This can result in an increase of speed or perhaps 10% or so.
|
|
|
|
|
| You can render any page you can supply a URL for. When
you render a page the page has to be reloaded by ABCpdf. This is because you -
as a client - are looking at the page from your current machine. ABCpdf lives
on the server and so it exists in a different session. So, you cannot generally
rely on cookies, session state or form submission in your page. The page must
be reliant only on the URL you supply. If you have to rely on session state,
you could use cookie-less sessions (which will give you a URL for your session)
or you could save the session information under a specific unique ID then pass
the ID via the URL and pick up the information via your server-side code. Problems
which appear to be related to SSL or HTTPS connections are often authentication
issues simply solved by providing a user name and password. See the LogonName
property for details. | | |
|
|
| Screen resolution is typically 96 DPI. So, when you view
an HTML page on your monitor, Windows will display it at 96 DPI. The disparity
between the screen resolution and the PDF 72 DPI resolution means that HTML appears
larger in print documents than it does on screen. You will need to apply
a scale of 72/96 (0.75) to compensate for this if you want both to appear the
same size. For example, if you are rendering a web page supplying a value
of 800 for the Width parameter, you will need to set the width of your Rect to
600 if you want both to appear the same size. | | |
|
|
| PDF documents are predominantly vector based. As such, they
do not really have a DPI because they are resolution independent. The only portions
of PDFs which are raster based are images. Most elements of HTML - text,
lines - are vector based. So, they are resolution independent. The resolution
at which images in your web pages are rendered is complicated. Suppose you have
a 300 square image referenced by an image tag. If the width of your Doc.Rect is
the same as the width you pass to AddImageUrl, this will be rendered at 72
DPI. However, by changing the ratios between these two values, the image will be
scaled and hence the resolution will be changed. And... if your 300 square
is in an img tag with a width and height of 150, the default resolution will
be doubled. | | | |
|
| ABCpdf uses a sophisticated set of heuristics to determine
where to break pages. For greater control over page breaking, you can use the page-break-before,
page-break-after and page-break-inside CSS styles. You must ensure that
the element to which you apply your page breaking style is visible. For example:
<div style="page-break-before:always"> </div>
... will break but ... <div style="page-break-before:always"></div>
... will not.
Useful Tip. Debugging page break styles. Sometimes, your
page breaks don't work in they way you think they should. Because these kinds
of tags are invisible, it's very difficult for you to know whether you've applied
them correctly or not. One simple solution is to debug your HTML using a visible
style. For example, when you apply your "page-break-inside: avoid"
style, apply a right border style at the same time. That way, you can see exactly
where your elements are. If the borders don't appear in the right places, then
you know there's something wrong with your HTML.
|
The page break styles in the Gecko engine are not always applied as intuitively as they are in MSHTML. The root of this is the CSS specification that which says that break styles must be applicable to block-level elements within the "normal flow of the root element". It allows for these styles to be applied to other elements but does not mandate it.
The upshot of this, within the Gecko engine, is that page break styles cannot be applied within tables, to elements such as table rows. If you are unsure about whether something is likely to work just try Print Preview from within Firefox 38.0 as a simple sanity check.
| | |
|
|
| You may wish to take a snapshot of the current URL. In
many circumstances, you should be able to derive a URL for the current page using
the value of the SERVER_NAME, URL and QUERY_STRING Server Variables. You should
be able to derive a URL for the previous page using the HTTP_REFERER (sic) Server
Variable. Alternatively, you can obtain the HTML of the current page using
the HttpResponse.Filter property or by overriding the Render method of the page.
You can then present this HTML to ABCpdf using AddImageHtml. If your HTML references
resources using relative references, you may wish to insert a <BASE> tag
into the HTML before presentation to ABCpdf. When you perform this
kind of operation, be careful not to recursively call ABCpdf. If you do this,
you will get into a hall-of-mirrors type situation and the software will not be
able to return you a sensible image. | | |
|