[publican-list] sortable lists, esp. glossaries

Fri Feb 10 15:58:12 UTC 2012

Hi Peter -- 

If I'm reading your post correctly... You're saying that simply invoking a locale-specific collating sequence during publishing (i.e., with no additional collating clues provided) may produce results acceptable to each locale? 

Unless anyone knows otherwise, it sounds like that hypothesis is worth testing: select/create a source document with enough collating edge conditions, format locale-specific versions, and have the results reviewed by locale-skilled readers. 

If it works, obviously it's a lot easier than building a new solution... 

Thanks -- 

Fred 

----- Original Message -----

> On Tue, Jan 31, 2012 at 06:45:12PM +1100, I wrote:

> > Apparently, the mapping from a string of Kanji to its pronunciation
> > (ordering) isn't even a deterministic operation, at least for
> > proper
> > names.

> (Of course I meant "proper nouns". Actual non-determinism might even
> be
> limited to proper nouns, though I'm not sure that that changes
> anything
> from a coding point of view.)

> > Thus, the solution would have to involve supplying pronunciations
> > somehow
> > for at least some glossary entries.

> More precisely, it follows that sorting Kanji entries by
> pronunciation
> would in general require supplying pronunciations for some entries.

> However, I don't want my unclear wording to contribute to wrong
> conclusions about what Publican actually requires: I'm not in a
> position
> to say whether Publican requires index or glossary entries involving
> Kanji to be sorted by contextually-correct pronunciation. All I've
> learnt over the past couple of days is that *outside of* a book index
> or
> glossary, Kanji are sorted sometimes by contextually-correct
> pronunciation and sometimes by some other order (and I think there's
> more
> than one alternative, even).

> If anyone wants a concrete sample for an "is this output acceptable"
> question (and if not using software just for japanese sorting, like
> Lingua::JA::Sort::JIS), then I suggest making sure that the collation
> function is tailored for a Japanese locale (e.g. using
> Unicode::Collate::Locale->new(locale => 'ja-JP')): without that,
> collation software is unlikely to try to use a specifically-japanese
> ordering of Kanji characters or intersperse Katakana with Hiragana.

> In particular, the documentation for plain Unicode::Collate is
> explicit
> that it doesn't intersperse Katakana with Hiragana, and that its
> Kanji
> ordering is simply by unicode block & code point rather than by a JIS
> ordering.

> So I think the easiest thing to do that has a good chance of getting
> a
> "yes, this is acceptable" answer would be to switch from
> Unicode::Collate
> to Unicode::Collate::Locale and pass locale => $LANG to the
> constructor
> (where $LANG is the Publican language like en-US or ja-JP).

> Effect on other languages:

> Switching to a locale-sensitive collator might also make for a better
> collation of Indic languages (handling of virama, and some related
> reordering rules).

> Whereas if applied to Spanish for indexes, note that it might move
> entries like chkconfig from near the beginning of the C entries to
> just
> before D; it's not clear to me whether that's a good or bad thing for
> a
> word like chkconfig that isn't even spanish and thus arguably isn't
> using
> the Spanish ch digraph.

> (In both cases, I haven't actually tested the behaviour, nor have I
> asked
> a native speaker for their preferences for index/glossary sorting in
> technical documentation.)

> pjrm.

> _______________________________________________
> publican-list mailing list
> publican-list at redhat.com
> https://www.redhat.com/mailman/listinfo/publican-list
> Wiki: https://fedorahosted.org/publican
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/publican-list/attachments/20120210/fc5122cc/attachment.htm>