[Fedora-packaging] UTF-8 package names

Today the Packaging Committee began a discussion of whether package names should be allowed to contain the full range of Unicode characters (encoded as UTF-8) or be restricted to the ASCii subset.

This is a bit of a contentious issue with the Packaging Committee members split but not yet set in stone about how to proceed.

The main arguments seem to be:
Pro Unicode:
* Upstream knows best what name is most appropriate for their users. For us to change it locally in Fedora doesn't make much sense. * We allow Unicode in every other piece of the spec so why not in the package name? * We should be shaking out bugs in the handling of Unicode in our software rather than hiding issues with it.

* Hard to type unicode package names, therefore it is a usability problem.
* Is there a limit? Even if European letters are fine what about Kanji or Sanskrit? * Some pieces of software won't handle unicode package names and will need to be fixed.

One package has been submitted for review with a unicode using package name and has some applicable comments:

= Section of Packaging Committee Logs about Unicode Package Names =

(09:32:18 AM) racor: banning non-ascii chars from package names
(09:32:29 AM) spot: racor: eww. is someone actually doing that?
(09:32:39 AM) f13: somebody did a + in a version I thought
(09:32:40 AM) abadger1999: ivazquez sent something to the list. No actual draft but it was very simple.
(09:32:47 AM) tibbs|h: spot: There's a review submitted.
(09:32:49 AM) svahl left the room (quit: ).
(09:32:51 AM) f13: but that doesn't count.
(09:32:51 AM) rdieter: f13: inkscape, yeah, but fixed now.
(09:32:59 AM) abadger1999: racor: What is the ratioinale?
(09:33:01 AM) f13: tibbs|h: I bet its from nim-nim isn't it?
(09:33:06 AM) racor: yes, there is a packaging under review ... <digging>
(09:33:07 AM) tibbs|h: Yes, some fonts thing.
(09:33:13 AM) spot: that seems like a no-brainer to me.
(09:33:19 AM) abadger1999: spot: Why?
(09:33:22 AM) tibbs|h: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=261881 (09:33:24 AM) buggbot: Bug 261881: medium, medium, ---, Nobody's working on this, feel free to take it, NEW , Review Request: écolier-court-fonts - Écolier court fonts
(09:33:37 AM) spot: because xchat rendered that two different ways for me?
(09:33:45 AM) spot: i don't even want to think about yum.
(09:34:02 AM) abadger1999: spot: yum does need a fix.  There's an open bug.
(09:34:05 AM) f13: well, it would be good to test our infrastructure.
(09:34:08 AM) tibbs|h: I don't really have an issue with non-ascii package names; we can't keep falling back on "our infrastructure sucks" forever.
(09:34:11 AM) abadger1999: But I'm wondering why it's a bad thing.
(09:34:21 AM) abadger1999: Shouldn't we consider places where utf-8 fails to be bugs? (09:34:44 AM) f13: non-ascii packages may have to be renamed if they ever show up in RHEL
(09:34:53 AM) f13: I'm not sure how the RHN beast will handle it.
(09:35:07 AM) spot: well, that is RHEL's problem.
(09:35:10 AM) f13: yep
(09:35:17 AM) f13: I'm not saying it should have much bearing on our decision (09:35:27 AM) spot: racor: are you against it for aesthetic reasons or technical ones? (09:35:31 AM) racor: IMO, the technical issues are minor, the real issue is usabilitsy
(09:35:44 AM) racor: spot: neither usability
(09:35:51 AM) tibbs|h: I always said that I won't review what I can't type.
(09:36:05 AM) tibbs|h: But my inability to type that is my dysfunction.
(09:36:11 AM) spot: well, i suppose that it would make it more difficult for english typists to install/use a package
(09:36:23 AM) spot: but not impossible
(09:36:25 AM) racor: consider: 90% of US users are not even able to type accented chars
(09:36:27 AM) tibbs|h: But is that a reason to ban something?
(09:36:47 AM) tibbs|h: I can't read German either, so let's kick out German translations. (09:36:53 AM) racor: 99.89% of Western folks are not able to type east asian chars
(09:37:07 AM) f13: and 40% of all statistics are made up on the spot
(09:37:20 AM) spot: my concern is when people start naming packages in kanji. (09:37:40 AM) racor: tibbs: right type the char ß (this is not a beta it's a sharp ss) (09:37:46 AM) spot: we already mandate that the spec file must be written in american english (09:38:11 AM) tibbs|h: racor: I already said I can't type those characters; I don't see what point your question serves.
(09:38:28 AM) rdieter: spot: +1, I think that covers the case here then, no?
(09:38:48 AM) tibbs|h: But I don't see the practical difference between ideograms and accented vowels, either. (09:38:57 AM) racor: tibbs: my point: anything outside of ascii not universal enough (09:38:59 AM) abadger1999: Yum bug: https://bugzilla.redhat.com/show_bug.cgi?id=261961 (09:39:01 AM) buggbot: Bug 261961: low, medium, ---, Jeremy Katz, NEW , Yum does not like non-ascii package names
(09:39:14 AM) spot: well, technically we only say summary and description
(09:39:18 AM) spot: "Please put personal preferences aside and use American English spelling in the summary and description." (09:39:18 AM) tibbs|h: Either we say "ASCII" or "UTF-8"; there's no point in anything in the middle.
(09:39:39 AM) tibbs|h: We do not mandate that the entire spec be in English.
(09:39:42 AM) f13: we already support non-ascii file names
(09:39:56 AM) f13: so lon gas they are UTF-8
(09:40:00 AM) tibbs|h: We permit translated descriptions.
(09:40:28 AM) rdieter: Can we at least agree that ASCII *SHOULD* be used? Not sure if it warrants a MUST.
(09:40:40 AM) spot: yeah, i can support that ASCII should be used
(09:40:43 AM) abadger1999: rdieter: I'm not sure I would agree with that.
(09:40:51 AM) tibbs|h: I don't know if there's really a point.
(09:41:05 AM) racor: package names!!! descr, etc. are legacy, convenience,
(09:41:21 AM) tibbs|h: But we have rules about naming packages after things like the upstream tarball. (09:41:27 AM) abadger1999: I mean, nim-nim's point is also valid -- if the upstream name is non-ascii who are we to differ from upstream?
(09:41:40 AM) rdieter: shrug, I'm ok with pushing the envelope here too.
(09:41:58 AM) spot: if only to encourage people not to be stupid and spell things like this: ƁƎƗȂ (09:42:11 AM) f13: can we try this package as a trial run and see what all it breaks? (09:42:18 AM) racor: abadger1999: would you say the same if the package name was in cyrillian or turk?
(09:42:28 AM) ***spot is getting pulled away
(09:42:40 AM) spot: we'll have to pick this up next meeting
(09:42:41 AM) spot: sorry. :(
(09:42:52 AM) rdieter: f13: +1 :)
(09:42:53 AM) abadger1999: racor: What does the package do? Is it a cyrillic font package? Is it something specific to Russian language speakers? (09:43:26 AM) abadger1999: spot: I think that needs to be addressed upstream.
(09:43:35 AM) racor: abadger1999: not necessarily. Just author preference.
(09:44:09 AM) abadger1999: racor: So that's the gray line for me. I think I'd say that we can try to influence upstream but it is an upstream decision. (09:44:09 AM) tibbs|h: We have a practical issue in any case, because our infrastructure doesn't properly support non-ASCII package names currently.
(09:44:27 AM) racor: I could call a package: bärensößchen ...
(09:44:33 AM) tibbs|h: Why not?
(09:45:00 AM) tibbs|h: Looks like as good a name as anything else.
(09:45:54 AM) racor: tibbs: except that most people would not be able to type it .... yum install <package-name>
(09:46:09 AM) abadger1999: Copy and paste....
(09:46:19 AM) tibbs|h: Graphical interface.
(09:46:25 AM) tibbs|h: Tab completion.
(09:46:34 AM) tibbs|h: Learn to type German.
(09:47:06 AM) racor: abadger1999, tibbs: that's laughable.
(09:47:16 AM) tibbs|h: It's weird to see this argument backwards; usually the Americans are arguing for "nothing I can't understand" while the Europeans and Asians just laugh at the idiot luddites.
(09:47:46 AM) tibbs|h: "Nothing in Fedora should use the metric system."
(09:48:26 AM) racor: tibbs: learn to type Thai
(09:49:15 AM) tibbs|h: I personally have no interest in doing so.
(09:49:24 AM) tibbs|h: I don't see how that has any bearing on anything, though.


