Working, Elegant DocBook to PDF Solution

Tue Jun 12 22:01:12 UTC 2007

Hey guys,

I applied for a position in the GSoC project for Fedora. The title of
the project is "Working, Elegant DocBook to PDF Solution." I was in
correspondence with Karsten Wade throughout the application process.
However, the last student slot was cut and this project was not
chosen.

A few days ago, I received an email from Karsten asking me if I was
still interested in the project. I was very much interested and thus I
am posting my original proposal (abridged - removed some items such as
schedules) for the GSoC project here. If the solution sounds viable
and applicable, we could definitely get the project going.

Thank you for your time,
Amit

*******************

Application for Summer of Code 2007: Amit Uttamchandani

********
Synopsis
********

I will a propose a solution to convert DocBook XML files into PDF. The
approach is a simple three-pronged solution that will focus on
simplicity and extensibility.

*******
Project
*******

The solution involves creating a command line utility to accomplish
the task described above. In its simplest form, the utility takes a
DocBook XML source and converts into a PDF file. To accomplish this
task, I decided to set a criteria for the finished product.

********
Criteria
********

1. Simple
2. Extensible
3. Standard

**************
Implementation
**************

The three-pronged approach involves the following: a front-end, a
parser and validator, and a PDF toolkit. First, the front end will be
based on Python. Python provides a simple yet powerful development
environment for implementing our utility. The command line tool would
have following interface:

docbook2PDF <input>.xml <output>.pdf

Second, an XML parser needs to be utilized to parse the DocBook XML
source. After studying various implementations, the standard Python
XML parser expat is the best choice. Advantages of Expat include its
speed in parsing, simple python bindings, and its implementation as a
standard python module. Expat, however, does not validate XML files.
To validate the XML source, xmlinit will be used.

Third, the reportlab open-source toolkit will be used to output the
parsed XML data structure into a PDF file. The reportlab toolkit
allows for easy output of python data structures into a PDF file.
Thus, once a DocBook XML source is parsed and validated, the resulting
Python object can then be formatted and outputted to PDF file using
reportlab.

The above implementation provides a simple, extensible, and standard
implementation for the python utility. The solution is based on
standard implementation and does not try to over complicate the
process. It is extensible because the command-line utility can be
easily expanded to include additional options and features.

********
Road map
********

1. Publish a more detailed description of the implementation and
specification, including initial flowcharts and function descriptions
to the fedora developers mailing list. Obtain feedback and incorporate
suggestions into design.

2. Complete initial version of docbook2PDF utility that successfully
validates and parses existing DocBook XML source.

3. Implement reportlab toolkit into docbook2PDF and successfully
output parsed XML object into PDF.

4. Thoroughly test the implementation and make sure it meets the
requirements and specifications. Write up documentation on usage of
the docbook2PDF utility.

***************
Future Road map
***************

1. Utility can be extended to batch process DocBook XML files. The
utility can be passed a directory and convert all DocBook XML sources
it finds into PDF files.

2. A GUI can be added using PyGTK to further extend the functionality
of the utility.

*********
Biography
*********

My name is Amit Uttamchandani and I will be completing my Bachelor's
degree in Computer Engineering this Summer at California State
University in Northridge. Before my current internship, I had been
working for the Information Systems department at the university.
During this time, our group was given the task to perform an inventory
of all the computers and peripherals such as printers and scanners in
the Engineering department. The current tool used at that time was an
Excel sheet. I found this to be quite disturbing. The data that we
were collecting would be put to much better use if it were stored in a
database. The entire engineering department could benefit from this
data. Thus, I suggested to implement a web-based solution involving a
PHP front end to a MySQL database back end.

Now, everyone could input the data virtually from anywhere in a simple
and easy to use web front end. Also, predefined queries are available
to output the data into a PDF file, complete with charts and graphs.
The hidden gem comes with Python and reportlab. As soon as the query
is made, a python script was called to retrieve the data from a MySQL
database and format it using reportlab and provide a link to the
outputted PDF file. This whole process worked seamlessly and allowed
our department to analyze how many computers where still using Windows
NT or how many computers had less that 256MB of RAM, etc.

The above project took 3 months during the summer to complete. The
python and reportlab toolkit integration was truly a beauty that
shined and impressed. I have been working with reportlab and python
ever since to generate on-demand PDF files and reports from databases.

I have also worked with Python and XML. I successfully created a
Python script 'prop' to parse and propagate XML test case data from
one project to another. The implementation used Python and expat to
accomplish the task. Implementing this solution took around 3 weeks
and the result was a stable utility. By using standard python
libraries, I was able to develop using an OpenBSD system and still use
the script in a Windows machine. That is beauty of Python.

I have been involved with open source software ever since my exposure
to Mac OS X. From that point on, I strived to use open source software
wherever possible. After sometime I felt the need to return the favor
the community and I believe this is an opportunity for me to give
something back and be part of the open source ecosystem.

*******************