Features/ArchitectureSupport - changing what we build for

Callum Lerwick seg at haxxed.com
Thu Feb 5 12:18:27 UTC 2009


On Wed, 2009-02-04 at 12:20 -0800, Conrad Meyer wrote:
> On Wednesday 04 February 2009 09:15:47 am Callum Lerwick wrote:
> > On Wed, 2009-02-04 at 15:00 +0100, Kevin Kofler wrote:
> > > Callum Lerwick wrote:
> > > > Going -O3 rather than -O2 is going to make a bigger difference than
> > > > anything else. If you want to improve performance, you need to run
> > > > profiles, locate performance critical bits of code, figure out if -O3
> > > > is beneficial, and/or write some hand tuned assembly/intrinsic code.
> > > >
> > > > Not to mention, the biggest performance problem on modern processors is
> > > > memory. Minimizing cache thrashing is way more important than what
> > > > instructions you use. Optimize data structures before code.
> > >
> > > That's actually an argument for investigating -Os, not -O3.

There's two caches, instruction cache and data cache. -Os optimizes the
icache footprint, but the icache is rarely the problem. The problem is
the dcache.

The two are getting confused here.

> > I don't think code size is what's making Firefox eat up 1gb RAM.
> 
> Nor is Firefox's 1GB of ram causing cache thrashing...

You don't think a bigger RAM footprint directly corresponds to a bigger
cache footprint? The bigger the dataset, the more dcache misses you're
going to see. The more RAM you eat, the less is available for the kernel
to use for block cache. And of course swapping will render all your
hours spent tweaking data structures and squeezing every last
instruction out of your inner loops quite moot.

The optimization priority list goes like this:

1) Minimize macro-level memory usage. Less overall memory usage, the
better. Less swapping, more RAM available for disk cache, less upgrade
treadmill. Everyone wins, big and small, from OLPC on up to 32-core Xeon
godbox.
2) Optimize cache usage. Minimize cache thrashing. You can't even
measure if your cycle shaving is doing anything at all if your memory
access patterns aren't reasonably optimal.
3) Only then do you optimize instruction usage.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20090205/99e41819/attachment.sig>


More information about the fedora-devel-list mailing list