On Wed, 2007-06-13 at 04:28 +1000, Mike Kearey wrote: > I'd like to see repeatable, measurable tests not subjective 'I think > it's faster' observations. I am no criticizing here, it's just that > humans are actually bad at this sort of thing. Measurements are better. I've been doing rather extensive performance profiling on OpenJPEG, using Fedora 6/7's gcc 4.1, and I've discovered a few things: Compiling for pentium3 rather than "generic" measurably improves performance on both my i386 test platforms, mobile Celeron 1.3 (PIII based), Celeron 2.1 (P4 based). This is at least partly due to optimizing signed integer math with cmovs. Compiling for pentium4 is slower than pentium3, even on the pentium4! Deriving gain from vectorization is tricky. It's not a Cray, if you're doing just a few calculations on a data set that doesn't fit in cache, you're bound by memory bandwidth and vectorization will likely just slow you down due to the additional alignment requirements. And, there's no signed integer math in SSE2... I need to do some more testing with generic i686 and pentium2 as well, but the primary gain seems to come from compiling for a minimum of i686.
Description: This is a digitally signed message part