-
Products
JBoss Enterprise Middleware
Web Server Developer Studio Portfolio Edition JBoss Operations Network FuseSource Integration Products Web Framework Kit Application Platform Data Grid Portal Platform SOA Platform Business Rules Management System (BRMS) Data Services Platform Messaging JBoss Community or JBoss enterprise -
Solutions
By IT challenge
Application development Business process management Enterprise application integration Interoperability Operational efficiency Security VirtualizationMigration Center
Migrate to Red Hat Enterprise Linux Systems management Upgrading to Red Hat Enterprise Linux JBoss Enterprise Middleware IBM AIX to Red Hat Enterprise Linux HP-UX to Red Hat Enterprise Linux Solaris to Red Hat Enterprise Linux UNIX to Red Hat Enterprise Linux Start a conversation with Red Hat Migration services
Issue #17 March 2006
Features
- What is virtualization?
- An interview with Brian Stein
- Virtualization Resource Center goes live
- Introduction to DocBook XML, part 2
- Risk Report: A year of Red Hat Enterprise Linux 4
- Video: Red Hat Summit Nashville
- LibriVox gives books a voice in the public domain
- See you at the Summit: Eben Moglen
- Developers: Come play with us and build the future
- Book review: Active Liberty
- Video: Skanska
- Book review: Linux Patch Management
- Podcast: So you'd like to contribute to open source software
From the Inside
In each Issue
- Editor's blog
- Red Hat speaks
- Ask Shadowman
- Tips & tricks
- Fedora status report
- Podcast (XML)
- Magazine archive
Feedback
Introduction to DocBook XML, part 2: XSLT
by Paul W. Frields
Table of Contents
Last month I introduced you to DocBook XML, a wonderful way to write documentation for software projects or just about any purpose. These articles, for instance, are written in DocBook XML. On the Fedora™ Documentation Project, we are using DocBook XML to produce release notes, tutorials and guides for Fedora users and administrators. Because DocBook has everything to do with content and very little to do with presentation, the author can use any of a number of tools, and concentrate on writing rather than formatting. Because DocBook XML uses standard XML technologies, there are plenty of ways to stylize and present the information contained in a document.
This month we're going to look at the Extensible Stylesheet Language, and in particular, XSL Transformation (XSLT) (XSL), which can be used to format or transform information in XML files. The Fedora Documentation Project, for example, uses XSLT to capture, process and convert XML information contained in our documentation files. I'll present several working examples of XSLT, so to follow along with this article and run them, you should have the "Authoring and Publishing" package group installed on your Red Hat Enterprise Linux or Fedora Core system. Use the appropriate software management utility for your platform. For Fedora Core use the following command:
su -c 'yum groupinstall "Authoring and Publishing"'I will assume you've read last month's article, but nothing beyond that. I will steer clear of an exhaustive examination of XMLish jargon so as not to frighten anyone away. Just keep in mind that many of the concepts in this article have complex underpinnings, which you'll want to investigate if you want to peek "under the hood."
Of course there are a plethora of books available, a few published in electronic form on the Internet, that discuss XML and XSL. Although this article can't possibly cover all the details of this powerful and flexible technology, it can at least present some of the most rudimentary concepts. I would highly recommend, if you're just getting started with XSL, that you download and keep a copy of a good "cheat sheet." One of the best compact ones I've found for XSL is a tutorial called XSL Concepts and Practical Use written by Paul Grosso and Norman Walsh, which you can find at http://nwalsh.com/docs/tutorials/xsl/. You'll find information in that tutorial not just about XSLT, but also XSL Formatting Objects (XSL-FO), which is not covered in this article.
One way to use XSL is to simply write a stylesheet, which is itself an XML document. XSL is used frequently in DocBook XML processing tasks such as converting the XML source into another format. The xmlto command, for example, which we looked at briefly in Part One of this series, uses XSL stylesheets to create HTML pages. The easiest way to start learning a little about XSL is to see a stylesheet in action.
Example 1. authors.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no"
version="1.0" encoding="UTF-8" />
<xsl:template name="people" match="/">
<xsl:for-each select="//author|//editor">
<xsl:element name="person">
<xsl:attribute name="fullname">
<xsl:value-of select="firstname"/>
<xsl:text> </xsl:text>
<xsl:if test="othername != ''">
<xsl:value-of select="othername"/>
<xsl:text> </xsl:text>
</xsl:if>
<xsl:value-of select="surname"/>
</xsl:attribute>
</xsl:element>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>Let's walk through the stylesheet to see what function it performs. Keep in mind that stylesheets are usually written against particular document types, but they make no hard and fast requirements for the input XML document. In fact, they will sometimes even work against an invalid document, although they may have problems with documents that are not well-formed.
The topmost line is the common XML declaration, and following
that is the xsl:stylesheet element. This
element is interesting because it declares a
namespace that is used in this document to
identify elements. Essentially, you could interpret this element
as stating "This is an XSL stylesheet, by which 'XSL' means it
adheres to the version 1.0 standards set out in the document at
[this URL]." Namespaces are often used in XML to provide
functions not present in the definition for the container
document. Using namespaces will be a suitable topic for a
separate article, so we'll save that for another time.
The next nested level in this document includes the following elements, which are not the only possible elements, but all that is required for our purposes:
The
xsl:outputelement, as you might expect, sets out rules for the output of this stylesheet. In this case, the output will be a UTF-8 encoded XML document, fully indented with the standard XML "processing instruction" header.The
xsl:templateelement has the namepeople, and indicates which nodes in the original XML document are to be processed, and how.
A node is an atomic unit of valid XML.
It might be an element, or an attribute, or a text string. Nodes
are arranged in a tree in XML documents. The xsl:template matches and processes
nodes based on its attributes and content. The match rule identifies specific nodes
or groups of nodes, whether they are related in the tree or
completely disparate. The matching syntax used is known as
XPath, and incorporates a very flexible
pattern-based system for locating elements. You can find a fuller
explanation of XPath at the aforementioned XSL tutorial, but here
are a few simple examples:
/matches the root (top-level) node
//namematches an element of type
nameanywhere in the documentfoo/*/barmatches any
barelement that has a grandparentfooelement in the current node (context)book[@title="Infinite Jest"]matches a
bookelement in the current node (context) that has atitleattribute with a value ofInfinite Jest
Our template will match only the root of the document, which
you should note is not the same as matching
every node in the document. (That rule would be match="*".) The current node or
context when the template is invoked, therefore, is the root of
the document. For any author
or editor element found,
regardless of its location in the content tree (or
infoset) the stylesheet will write a
person element. That element
will have a fullname
attribute consisting of the following:
the value of the source element's
firstnameattribute, followed by a spaceif the source element has an
othernameattribute, the value of that attribute, followed by a spacethe value of the source element's
surnameattribute
If our input XML document contains a node like the following example:
<author> <surname>Public</surname> <firstname>John</firstname> <othername role="mi">Q.</othername> </author>
then the document resulting from a transformation using the above stylesheet will contain the following element:
<person fullname="John Q. Public"/>
Notice that, because the output element has no text content, but only attributes, it is called empty. Rather than using both an opening and a closing tag, it uses only a single self-closing tag, meaning the final angle bracket is prefaced with a slash. The fact that this output element has no text content does not change its intrinsic value as a node in the infoset.
The libxslt library
contains the priceless xsltproc utility, which,
among other functions, allows you to process XML documents with
your XSL stylesheets. If you want to see
xsltproc in action, copy and paste the XML file
from last
month's article into your favorite editor, and save it
as original.xml. Similarly copy and paste
the XSL stylesheet above to authors.xsl. Then
run the following command:
xsltproc authors.xsl original.xmlThe results should be XML output with new elements. This
tiny, simple example shows you the power of XML for data
interchange. XSLT provides instructions that allow you to move
data easily between different XML document types, as this example
demonstrates. Of course, this output could have been designed for
a specific DTD, and notated accordingly. Alter the declaration of
the xsl:output element slightly:
<xsl:output method="xml" indent="yes" omit-xml-declaration="no" version="1.0" encoding="UTF-8" doctype-public="-//Bogus//DTD RHM Example XML V0.01//EN" doctype-system="people.dtd" />
Now regenerate the output using the same
xsltproc command, and note the difference in
the output XML. If you had a DTD matching the public identifier
and located at the URL specified
(people.dtd), you could validate the
resulting file for consistency with the DTD. Notice that the root
element noted in the DOCTYPE is derived
from the top-level element of the output. This feature allows you
to easily extract part of an infoset into a separate document that
uses the same DTD. Say for instance we wanted to extract our
document's revision history to a separate file for some auditing
purpose. We could write a very short XSLT for this
purpose:
Example 2. revhist.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no"
version="1.0" encoding="UTF-8"
doctype-public="-//OASIS//DTD DocBook XML V4.2//EN"
doctype-system="http://www.docbook.org/xml/4.2/docbookx.dtd"/>
<xsl:template match="/">
<xsl:for-each select="//revhistory">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XSLT is not limited to outputting XML; it can output any kind of file, including strange binary formats. Needless to say, those types of formats require XSLT that is usually harder to read and understand, so we'll steer clear of them for now. You can easily imagine, however, using XML to populate a different kind of text file, such as a configuration file, which uses some sort of regular formatting.
The FDP is in the final stages of preparing a packaging process for official documentation that draws content directly from some of our XML source. Let's look at a small portion of XSLT from one of our source files:
Example 3. spec.xsl
<!-- Transform rpm-info.xml into a SPEC File -->
<xsl:stylesheet version="1.0" xml:space="preserve"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="UTF-8" indent="no" method="text"
omit-xml-declaration="no" standalone="no" version="1.0"/>
<xsl:param name="lang" select="'en'" />
<xsl:param name="docbase" select="'example-tutorial'" />
<xsl:template match="/"># Fedora Documentation Specfile
%define docbase <xsl:value-of select="$docbase"/>
%define doclang <xsl:value-of select="$lang"/>
%{!?fdpdir:%define localbuild 1}
%{!?fdpdir:%define fdpdir %{_datadir}/fedora/doc}
Summary: Fedora Documentation: %{docbase}-%{doclang}
Name: fedora-doc-%{docbase}
Version: <xsl:value-of select="/rpm-info/changelog/revision[@role = 'doc'][1]/@number"/>
Release: <xsl:value-of select="/rpm-info/changelog/revision[@role = 'rpm'][1]/@number"/>
...
You can find the current version of the entire XSLT in our CVS
store at http://cvs.fedora.redhat.com/viewcvs/docs-common/packaging/spec.xsl?root=docs.
Like much other XSLT, this stylesheet expects a certain kind of
XML document for input. In this case it's an
rpm-info document, whose DTD you can also
find in CVS, at http://cvs.fedora.redhat.com/viewcvs/docs-common/packaging/rpm-info.dtd?root=docs.
Our packaging passes a couple of parameters to this
stylesheet, including docbase and
lang. These parameter values are used to
populate the resulting specfile, which for the English
(en_US) version of
example-tutorial, might have a preamble that
looks like this:
Example 4. Results of transformation via
spec.xsl
# Fedora Documentation Specfile
%define docbase example-tutorial
%define doclang en_US
%{!?fdpdir:%define localbuild 1}
%{!?fdpdir:%define fdpdir %{_datadir}/fedora/doc}
Summary: Fedora Documentation: example-tutorial-en_US
Name: fedora-doc-example-tutorial
Version: 0.14.1
Release: 1
...
You can feed xsltproc parameter values at
the command line using the --param or
--stringparam options. It's tempting to think
the parameter declarations set explicitly in the stylesheet above
would simply override any lang and
docbase parameters received at invocation
time. This is not the case, fortunately; instead, the first
declaration of a parameter is given priority. Therefore, these
declarations in the stylesheet ensure that default values are set.
This functionality is helpful when you need consistently
acceptable output, but it also can be useful by producing a
visible indication in a file that no value was received from the
calling procedure. for example, you could set the default
fallback value for the parameter to
FIXME.
Hopefully you've seen the potential for XSLT to transform not just your documents, but the way you use the information they provide. Although we've focused specifically on applying XSLT to DocBook XML source, it is equally powerful when used with any XML data store. This is why XML has become ubiquitous throughout business information enterprises: it ensures that your data is always accessible and never arbitrarily confined, regardless of the applications using it. Your data (or in the case of DocBook, documentation) can be leveraged to work in conjunction with your software, and vice versa.




