[publican-list] WeasyPrint: another possible alternative to wkhtmltopdf ?

Wed Aug 1 08:14:42 UTC 2012

I just mentioned WeasyPrint (http://weasyprint.org/) in a different thread
here, so I thought I should also mention what I know about it in terms of
suitability for Publican purposes.

Advantages over wkhtmltopdf:

  - Uses CSS, including more support for css3-page things than wkhtmltopdf.
    E.g. can set page size, borders, page numbering information etc. from
    a CSS stylesheet.

    Supports 'page-break-after: avoid', 'page-break-inside: avoid',
    'widows', 'orphans' and so on.

    The only css3-page thing it doesn't yet support is named page styles.
    (The stylesheet I use with Morp makes quite a bit of use of named page
    styles to get the same page headings and page numbering styles as FOP
    uses.  However, wkhtmltopdf doesn't support them either, so this isn't
    a disadvantage compared to wkhtmltopdf.)

Disadvantages compared to wkhtmltopdf:

  - 6-10 times slower than wkhtmltopdf.  Of course that makes a difference
    for Publican use, but it may still be worthwhile if the output looks
    better, and one could always have a command-line switch to use a faster
    renderer (wkhtmltopdf or even noop) for draft purposes.

  - Doesn't reuse a web browser's layout engine.  However, its CSS support
    is coming along quite well, and I'd expect it to support most things
    Publican output currently uses.

  - Doesn't support RTL languages (Hebrew, Arabic etc.).

Compared to FOP (and in common with all the alternatives to FOP):
it doesn't yet include "on page N" in cross references for printed output,
doesn't support footnotes, nor columns (for the index), and lacks niceties
such as repeating the table header/footer when splitting a table over two
pages;
while on the other hand it does use CSS, supports font substitution and
Indic languages, doesn't fall apart for page-break-inside:avoid content
longer than a page, and isn't limited to platforms where Java runs.

Compared to Morp: it's already intended for use by others, and already
supports in-text clickable links;
while on the other hand wouldn't do page headings and page numbering as
well as Morp or FOP, is slower, and uses a simplistic approach to
line-breaking and pagination (like wkhtmltopdf, but at least supports CSS
directives like page-break-before/widows/orphans).

A quick update on Morp while I'm on the subject: Morp output does now
include the "document outline" (i.e. the clickable list of section headings
shown on the left in most PDF viewers), but still no in-text clickable
links or "on page N" appended to cross references for printed output, and
there's still no work on making it easy for others to install and use Morp.

I still maintain a directory of Morp output for Indic-language
docs.redhat.com documents at
http://bowman.infotech.monash.edu.au/~pmoulder/redhat-docs/
(along with the English version of each), and I've added the JBoss
HTTP load balancing guide as well given that the FOP pdf for that on
docs.redhat.com is still scrambled.

I allow web search engines access to them (other than the English versions
of the Indic-language documents, as docs.redhat.com already has good PDFs
for those), which results in about one visitor a day (i.e. one source IP
address, excluding bots).  I think that access is a useful thing, but given
that Red Hat aren't doing QA on the output, let me know if you'd prefer me
to limit access more than that, or add a disclaimer at the top of the first
page of each PDF or something.

pjrm.