[rhelv6-beta-list] My first experiences with RHEL6 beta

Thu Jun 17 15:50:59 UTC 2010

On Thursday 17 June 2010 17:16, James Antill wrote:
> On Thu, 2010-06-17 at 08:53 +0100, John McNulty wrote:
> > But sometimes it's not the easiest thing to trust upstream to do
> > what's right when bugs that affect all distros go unfixed for
> > years. e.g.
> >
> > [john at dsv03-pv1 ~]$ echo $LANG
> > en_GB.UTF-8
> > [john at dsv03-pv1 ~]$ time grep '^....' /usr/share/dict/words
> > >/dev/null
> >
> > real 9m29.275s
> >
> > [john at dsv03-pv1 ~]$ export LANG=C
> > [john at dsv03-pv1 ~]$ time grep '^....' /usr/share/dict/words
> > >/dev/null
> >
> > real 0m0.116s
> >
> > This little gem has been hanging around since 2005
> > (http://savannah.gnu.org/bugs/?14472)
>
>  That looks like a different bug to me "greping for a needle of ascii
> text" vs. "grepping for what could be utf8 within ascii text
> haystack". And, as far as I know, the former _has_ been fixed.
>
>  With the later the "problem" is that you are asking for different
> answers if the haystack contains utf-8:
>
> % echo ¼¼ | LANG=C           grep '^..$'
> % echo ¼¼ | LANG=en_US.UTF-8 grep '^..$'
> ¼¼
> %
>
> ...and getting the correct answer is much harder to provide. But, of
> course, feel free to open a bugzilla against RHEL-6 grep/egrep.

What version of grep are you running, John? I remember a rebase a couple 
of weeks ago that brought this regression in performance, but it's been 
fixed. Both "dots" and ranges should be handled gracefully now, even 
though [a-z] in Unicode and in C are different things.

# echo $LANG
en_US.UTF-8
# time grep '^....' /usr/share/dict/words >/dev/null

real    0m0.113s
user    0m0.096s
sys     0m0.016s
# rpm -q grep
grep-2.6.3-2.el6.x86_64

Regards,
Radek