[publican-list] [Bug 475684] Find solution for using Glossaries with publican

Thu May 6 06:54:51 UTC 2010

Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug.

https://bugzilla.redhat.com/show_bug.cgi?id=475684

--- Comment #19 from Ruediger Landmann <rlandman at redhat.com> 2010-05-06 02:54:40 EDT ---
I had time to experiment a bit with this a few weeks ago; here's what I found,
with help from translators:

<glossentry>s inside a <glossary> get sorted correctly (at least
superficially[0]) for languages that use the Latin and Cyrillic alphabets.
Languages with different writing systems present different problems:

Chinese:
<glossentry>s appear in no discernible pattern. They're probably being sorted
according to Unicode codepoint.

Japanese:
A glossary in a Japanese technical publications could include up to four
different writing systems: Latin, Katakana, Hiragana, and Kanji. Terms
presented in Latin script should be separated from those presented in the three
Japanese writing systems (already sorted correctly), but terms in Katakana,
Hiragana, and Kanji should be interspersed according to their pronunciation. At
present, we're getting all the Katakana first, then all the Hiragana, then all
the Kanji. Katakana and Hiragana are syllabic scripts that represent the same
50 syllables; sorting them shouldn't be difficult and can probably be achieved
easily in an update to the docbook locale. The problem is that a single Kanji
character can represent one, two, or more syllables and its pronunciation (and
therefore sort order) can change when combined with other Kanji. 

Still untested:
Korean
all Indic languages

Korean and the various Indic languages that we support use syllabic scripts; if
they aren't already working correctly, I think that should be easily fixed in
the locale. 

I note that these sorting issues affect not only glossaries, but any books that
have indexes as well. 

[0] not all languages sort the Latin alphabet the same way, particularly when
it comes to handling accented characters or characters outside the "basic
Latin" group. I didn't explore what happens at these edges.

-- 
Configure bugmail: https://bugzilla.redhat.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.