Chapter 4. Presentation

Templating Developers' Guide

If you are already familiar with this section, you can skip ahead to any of the following topics:

Templating Introduction

This document is under construction.

XAlphabetSoup: Making sense of the standards

There are a number of different industry standards for textual representation of structured data, and a set of standard APIs for parsing and manipulating these documents in Java.

  • XML (eXtensible Markup Language) is a standard format for representing arbitrary tree-structured data in textual format. XML documents look a lot like HTML (HyperText Markup Language) because HTML and XML are both roughly based on SGML (Standard Generalized Markup Language). An XML document usually looks something like this:
    <document>
     <foo id="3">this is a foo element
        <bar>this is a bar nested inside a foo</bar>
     </foo>
     <foo id="4"> this is another foo with <bar/> 
        an empty bar element</foo>
    </document>
    XML tags are much like HTML tags, with attributes and nested tags, but there are a few important differences:

    • You can define your own tags. You also get to define a document type (DTD, or "Document Type Definition," which is itself an XML document). The DTD declares valid tags and valid attributes for those tags, and determines which tags can be nested in which other tags. Most XML parsers can validate the input XML against the DTD.

    • Documents must have a single root element ("document" in the above example), which contains all other tags in the document. In a well-formed HTML document, the <html> tag is the root element. However, most web browsers will tolerate HTML that is not enclosed in a single <html> tag.

    • All tags must have a closing tag for an XML document to parse, whereas unclosed tags like <p>, <img> abound in HTML. If necessary, an "empty" tag can be opened and closed with no contents using the shorthand <tag/>.

  • DOM (Document Object Model) is an API (available in both C++ and Java, though this tutorial focuses on Java) for parsing XML and manipulating an XML document as a tree structure in memory. The heart of the DOM is the Node object, which represents an element or attribute; a Document is the top-level object that represents an entire XML document.

  • SAX (Simple API for XML) is an event-driven API for parsing XML. A SAX parser generates callbacks to the programmer-specified DocumentHandler when elements or character blocks start or end.

  • XHTML is a reformulation of HTML 4.0 as an XML standard. It is essentially HTML with stricter syntax rules (for example, mandatory closing tags) to ensure that XHTML documents also parse as XML documents.

  • XSLT (eXtensible Stylesheet Language Transformations) is a language for specifying rules to transform an XML document into some other kind of output. It is most often used for rendering XML-formatted data into some human-readable format (XHTML, WML, etc.) However, in conjunction with XPath (a language for addressing and manipulating parts of an XML document), XSLT is a fully Turing-complete language that can transform any XML input into any XML output.

  • XSL (eXtensible Stylesheet Language) is a combination of XSLT and a set of formatting objects. XSL is actually a superset of XSLT. One XSL-based application is DocBook, which uses XSLT to transform a document using XML markup into either HTML or printer-friendly PDF. The formatting objects come into play with PDF generation. The XML document is first converted into another, intermediate XML document containing formatting-object elements. This intermediate document is then processed into PDF.

  • JAXP (Java API for XML Parsing) is an abstraction layer that provides a vendor-neutral way to handle certain difficult tasks in DOM and SAX. Despite its name, JAXP is not an API for parsing XML. Instead, JAXP provides a way to access results in an XML parser (DOM, JDOM, SAX).

  • TrAX (Transformation API for XML) is a Sun-supported Java standard API for transforming XML documents into other XML documents. The heart of TrAX is the Transformer interface, with the method transform(Source, Result). Xalan-J 2.0.1 supports TrAX by creating a Transformer object for an XSL stylesheet. Anything that transforms XML input into XML output may support TrAX even if it's totally unrelated to XSLT. TrAX is now part of JAXP 1.1.