Re: RFC: Description text in packages

On Tue, Dec 16, 2008 at 05:40:36PM -0500, Matthias Clasen wrote:
> > Unicode is character encoding
> > HTML tags or similar are semantic markup
> Thanks Alan, I know that quite well. 
> > Trying to extrapolate semantic markup from random ascii symbols is not
> > a reliable or robust path, particularly when you come to internationalise
> > things.
> One hopes the ascii symbols in most package descriptions are not
> entirely random... and extrapolating something from them can be quite

There is no reason to assume * for example is a bullet point, it could be a
footnote indicator, maths or ascii art. The Unicode bullet on the other hand
is uneqivocably a bullet point.

So extracting from UTF-8 is safer, but extracting at all is dangerous

> The specification for RPM doesn't imply anything about the description
> field. And this thread is about how to possibly improve the situation by
> agreeing on some form of interpretation.

Right - the field is plain UTF-8 textual data and has been for years. You
want to add a semantic version of it. That is fine but use a new header for
the field the way RPM intends things to be added.

