[Freeipa-devel] i18n infrastructure improvements

Fri Jan 11 15:04:29 UTC 2013

Hello list,
This discussion was started in private; I'll continue it here.

On 01/10/2013 05:41 PM, John Dennis wrote:
> On 01/10/2013 04:27 AM, Petr Viktorin wrote:
>> On 01/09/2013 03:55 PM, John Dennis wrote:
>
>>>> And I could work on improving the i18n/translations infrastructure,
>>>> starting by writing up a RFE+design.
>
>>> Could you elaborate as to what you perceive as the current problems and
>>> what this work would address.
>
>> Here are my notes:
>
>> - Use fake translations for tests
>
> We already do (but perhaps not sufficiently).

I mean use it in *all* tests, to ensure all the right things are 
translated and weird characters are handled well.
See https://www.redhat.com/archives/freeipa-devel/2012-October/msg00278.html

>> - Split up huge strings so the entire text doesn't have to be
>> retranslated each time something changes/is added
>
> Good idea. But one question I have is should we be optimizing for our
> programmers time or the translators time? The Transifex tool should make
> available to translators similar existing translations (in fact it
> might, I seem to recall some functionality in this area). Wouldn't it be
> better to address this issue in Transifex where all projects would benefit?
>
> Also the exact same functionality is needed to support release versions.
> The strings between releases are often close but not identical. The
> Transifex tool should make available a close match from a previous
> version to the translator working on a new version (or visa versa). See
> your issue below concerning versions.
>
> IMHO this is a Transifex issue which needs to be solved there, not
> something we should be investing precious IPA programmers time on. Plus
> if it's solved in Transifex it's a *huge* win for *everyone*, not just IPA.

Huh? Splitting the strings provides additional information 
(paragraph/context boundaries) that Transifex can't get otherwise. From 
what I hear it's a pretty standard technique when working with gettext.

For typos, gettext has the "fuzzy" functionality that we explicitly turn 
off. I think we're on our own here.

>> - Keep a history/repo of the translations, since Transifex only stores
>> the latest version
>
> We already do keep a history, it's in git.

It's not updated often enough. If I mess something up before a release 
and Transifex gets wiped, or if a rogue translator deletes some 
translations, the work is gone.

>> - Update the source strings on Transifex more often (ideally as soon as
>> patches are pushed)
>
> Yes, great idea, this would be really useful and is necessary.
>
>> - Break Git dependencies: make it possible generate the POT in an
>> unpacked tarball
>
> Are you talking about the fact our scripts invoke git to determine what
> files to process? If so then yes, this would be a good dependency to get
> rid of. However it does mean we somehow have to maintain a manifest list
> of some sort somewhere.

A directory listing is fine IMO. We use it for more critical things, 
like loading plugins, without any trouble.
Also, when run in a Git repo the Makefile can compare the file list with 
what Git says and warn accordingly.

>> - Figure out how to best share messages across versions (2.x vs. 3.x) so
>> they only have to be translated once
>
> There is a crying need for this, but isn't this a Transifex issue? Why
> would we solving this in IPA? What about SSSD and every other project,
> they all have identical issues. As far as I can tell Transifex has never
> addressed this issue sufficiently (see above) and the onus is on them to
> do so.

I don't think waiting for Transifex will solve the problem.

>> - Clean up checked-in PO files even more, for nicer diffs
>
> A nice feature, but I'm wondering to extent we're currently suffering
> because of this. It's rare that we have to compare PO files. Plus diff
> is not well suited for comparing PO's because PO files with equivalent
> data can be formatted differently. That's why I wrote some tools to read
> PO files, normalize the contents and then do a comparison. Anyway my top
> level question is is this something we really need at this point?

You're right that files have to be normalized to diff well.That's 
actually the point here :)
Anyway I'm just thinking of sorting the PO alphabetically - an extra 
option to msgattrib should do it.

>> - Automate & document the process so any dev can do it
>
> Excellent goal, we're not too far from it now, but of all the things on
> the list this is the most important.

-- 
Petr³