P4s, Athlons and bandwidth

Wed Aug 13 20:50:19 UTC 2003

On Wed, Aug 13, 2003 at 10:23:01PM +0200, Jean Francois Martinez wrote:
> Given that most/all of the recent boxes (ie the ones doing the real
> work) are P4s and Athlons it is time RedHat stopped compiling
> with -mcpu=i686 and started optimizing for the P4: -mcpu=p4

RHL glibc is compiled with -march=i686 actually, and there are not
many instructions other than those enabled by -mfpmath=sse
which would the compiler generate for normal code with -march=pentiumiii
and not -march=i686 (the only difference is scheduling and to my knowledge
the difference is not very big between i686 and PIII).
-mfpmath=sse is not usable for libm, because glibc on IA-32 relies
on extended precision in several places.
Scheduling difference between P4 and i686 is bigger, but I don't think
that code runs that well on Athlons.

> Another point is that there is no such thing like low-level glibc
> functions for the P4 and the Athlon.  The highest targetted
> processor is the PIII.  However documents in AMD's web site show
> that moving data (ie memcpy and friends) can be made several times
> faster if using 3DNow instructions and data prefetching, I gave only
> a cursory glance to the assembler parts of glibc but it didn't look
> like those parts (targetting the PIII) would be even remotely ideal
> for the Athlon.  Same thing about the P4.

Where have you seen PIII optimized assembly in glibc? AFAIK there is none.
P4/Athlon/PIII optimized stringops are certainly welcome (patches to
libc-alpha at sources.redhat.com), but bear in mind that any use of floating
point regs (SSE/SSE2/whatever) has quite a big price in lazy FPU saving
environment. Another thing to keep in mind is what are typical arguments
to these functions.

	Jakub