XML is a very readable data format ... if you're a bithead like me! But wouldn't it be nice to be able to format this help text just as you like, and be readable (and searchable!) in your favourite program, such as your web browser? Well, the XML format has a unique advantage over just about any other data file. It's highly structured, and this content can be manipulated using XSLT — the Extensible Stylesheet Language Transformations language, as defined by the World Wide Web Consortium (W3C). An XSLT stylesheet can sort and manipulate XML input, insert all kinds of extra data (such as, oh, a copyright manifest at the top of every output), and write the result to one or more output files. What has that to do with these 750 pages of HTML? I didn't create a single one of them. (This page is created 'All By Hand'TM — but it's not part of Adobe's XML files.)
So I wrote an XSLT stylesheet to read, parse, split, and sort the huge input file, and to output it as formatted and hyperlinked HTML pages. This was no mean task at all — the input file is over 113,000 lines long, and the latest version of the style sheet contains just under a thousand lines. All in all, it took me more than 50 hours of hard work to create the output you see before you. But wait! couldn't I just have search-and-replaced the XML commands into HTML? Sure — manually sorting and such, but still doable. But if I ever want to change something — say, I want 4 columns of index instead of 3, or I don't want to use tables anymore, but real CSS3 columns — all I have to do is re-write a (small) part of the XSLT style sheet and let the Saxon XSLT interpreter go over the XML again. If my installation of InDesign is upgraded with some scriptable plugin, all I have to do is run Saxon again. Since the XML help for CS4 has the same formatting, I can run Saxon again on its help file. And there you have it! Another entirely new set of more than 750 files, all properly indexed and hyperlinked, without me doing any more than changing how it should look!
Understanding XML is easy, especially if you are well-versed in HTML. If the latter, you should really know about
XHTML — the well-formatted younger brother of that plain old tack-together-with-sticky-tape that grew so bloated over the years.
XML is even more so structured, and the step over shouldn't be difficult. XSLT is something quite else. First off, it's defined in terms
of XML (meaning: you can run an XSLT style sheet over an XSLT style sheet, for example to format it for printing or viewing). It's a
programming language, but it doesn't work in the do-this-then-that sequential steps; instead, it's a procedural language, and it
describes what to do on each and every XML element it encounters in the input XML document. Writing your first XSLT style sheet can be
frustrating (usually, because your first attempts don't appear to do anything at all) until you get into the proper mind set. Soon you
will be scanning every folder on your computer for XML files to examine! (Mac OS X users: a .plist
configuration file is
also XML; lots of other files in that system are as well.)
So, why did I work so hard on writing the sheet for this? Well, it was fun to create something new — a good reason in itself. Besides, I needed the training. As a typesetter, one of my upcoming jobs includes a dictionary that is being assembled as Excel data. I experimented with a small data set, exporting it from Excel as XML, and then converting it to plain text, sorting entries and concatenating duplicates all in one go. After that I reckoned, "INX" — InDesign's compatibility file format — "is also an XML file", and went on to include all necessary formatting, from pages and paragraphs, right up to the style and formatting of individual words. Saxon still didn't break into a sweat — and now all I have to do if the complete data comes in, is export it to XML, run my sheet over it to create an INX output file, and open that one with InDesign. The publisher will be amazed if he receives a complete set of proofs within a couple of hours (let's be generous), no matter if it's a hundred pages of output or a thousand. How does that sound for page throughput!?
Acknowledgements
The first CHM conversion of the files for CS3 was done by fellow scripting enthousiast ABC GREEN. Additionally, he helped me set up my system to compile other versions as well. Thanks, mate!
Jongware, 25-Oct-2008
(this version 19-Jun-2010)
Jongware 2009 v2.1.3 | Contents :: Index |