Features/ArchitectureSupport - changing what we build for

Gregory Maxwell gmaxwell at gmail.com
Tue Feb 3 16:44:32 UTC 2009


On Tue, Feb 3, 2009 at 8:32 AM, Dominik 'Rathann' Mierzejewski
<dominik at greysector.net> wrote:
>> Do you have benchmarks that show given a constant -mtune that
>> -march=i686 makes a material difference for any significant userspace
>> apps vs -march=i586,  if not why are you being so insistent and
>> damning of the current compatible behavior?
>
> Which applications do you suggest for testing this hypothesis?

Most easily measured things (rsvg rendering, freetype) probably aren't
ever performance critical for typical users.  I think the codec idea
is good both on the 'likely to benefit from cmov' perspective as well
as 'performance actually matters' perspective, but it ought to be
things the Fedora ships.  Although, for video the video driver (and
hardware YUV->RGB) is probably more important than codec speed.


Ideally the test ought to be done on something modern that can't do
x86_64 (low end atom perhaps), since if you're on x86_64 you ought to
be using the x86_64 distro or suffering whatever performance you don't
get... but I don't have anything x86 handy.


So, libtheora decoding HD video.

libTheora-i586            5.872
libTheora-i686            5.86
libTheora-x86_64          5.643
libTheora-i586(no asm)    9.396
libTheora-i686(no asm)    9.142
libTheora-x86_64(no asm)  8.04


So, we learn— If you want performance for codecs use hand coded
assembly or at least x86_64. :)

For the with assembly version the improvement is 0.2%, without asm, 2.7%.

I'm a bit disappointed: This hasn't shown a real world improvement
(0.2% isn't helpful), but it suggests that one might be possible for
some other application.

Can someone suggest something else which is performance relevant for many users?




---Method disclosure---

Video is http://community.elphel.com/videos/1920x1072_24FPS.ogg
[gmaxwell at sonolumen libtheora-1.0]$ CFLAGS='-m32 -march=i586
-mtune=core2' ./configure --target=i586 ; make clean ; make -j5
[gmaxwell at sonolumen examples]$ (for i in `seq 1 10` ; do (time -p
./dump_video < 1920x1072_24FPS.ogg > /dev/null) 2>&1 | grep 'user' ;
done ) | awk '{sum+=$2} END { print sum/NR}'
5.872
[gmaxwell at sonolumen libtheora-1.0]$ CFLAGS='-m32 -march=i686
-mtune=core2' ./configure --target=i686 ; make clean ; make -j5
[gmaxwell at sonolumen examples]$ (for i in `seq 1 10` ; do (time -p
./dump_video < 1920x1072_24FPS.ogg > /dev/null) 2>&1 | grep 'user' ;
done ) | awk '{sum+=$2} END { print sum/NR}'
5.86
[gmaxwell at sonolumen libtheora-1.0]$ CFLAGS='-m32 -march=i686
-mtune=core2' ./configure  --disable-asm ; make clean ; make -j5
[gmaxwell at sonolumen examples]$ (for i in `seq 1 10` ; do (time -p
./dump_video < 1920x1072_24FPS.ogg > /dev/null) 2>&1 | grep 'user' ;
done ) | awk '{sum+=$2} END { print sum/NR}'
9.142
[gmaxwell at sonolumen libtheora-1.0]$ CFLAGS='-m32 -march=i586
-mtune=core2' ./configure  --disable-asm ; make clean ; make -j5
[gmaxwell at sonolumen examples]$ (for i in seq 1 10 ; do (time -p
./dump_video < 1920x1072_24FPS.ogg > /dev/null) 2>&1 | grep 'user' ;
done ) | awk '{sum+=$2} END { print sum/NR}'
9.396
[gmaxwell at sonolumen libtheora-1.0]$ CFLAGS='-mtune=core2' ./configure
--disable-asm ; make clean ; make -j5
[gmaxwell at sonolumen examples]$ (for i in `seq 1 10` ; do (time -p
./dump_video < 1920x1072_24FPS.ogg > /dev/null) 2>&1 | grep 'user' ;
done ) | awk '{sum+=$2} END { print sum/NR}'
8.04
[gmaxwell at sonolumen libtheora-1.0]$ CFLAGS='-mtune=core2' ./configure
; make clean ; make -j5
[gmaxwell at sonolumen examples]$ (for i in `seq 1 10` ; do (time -p
./dump_video < 1920x1072_24FPS.ogg > /dev/null) 2>&1 | grep 'user' ;
done ) | awk '{sum+=$2} END { print sum/NR}'
5.643




More information about the fedora-devel-list mailing list