Improving security

Wed Jan 19 19:45:44 UTC 2005

Hans de Goede <j.w.r.degoede <at> hhs.nl> writes:

> 
> Hi,
> 
> I just read this interesting article on lwn:
> http://lwn.net/Articles/106214/
> (lwn subscriber only)
> 

Yay :)

> This talks about things like:
> 1 Stack Smash Protection
> 2 PAX (alternative Exec Shield)
> 3 Position Independent Executables.
> 
> Stack Smash Protection sounds like a cool feature to me. I don't know 
> what the performance impact is, but as a developer even if it is to slow 
> to use by default I would love to have it intergrated into the gcc 
> shipped by Fedora to make debugging easier.
> 

I use it.

The performance impact isn't noticable, but it's highly variable.  It's
theoretical maximum is something like 8%; and it drops when a function
gets bigger, or when functions aren't protected.

You can use heuristics (-fstack-protector) to protect only functions
with a local character array; or just protect all
(-fstack-protector-all).  With the heuristics, it's likely you don't
much encounter protected functions.

My best guess looking at the design (not the code) is that it takes
about 4 instructions to protect a function on the base, plus one more
per passed argument, based on the below.

SSP rearranges variables at compile time.  This produces no runtime
overhead.

SSP protects a function with a local char[] using a __guard value.
This would require expanding the stack frame.  The best way is to
allocate the entire stack frame at once, so I assume the GCC devs
are smart and that this produces 0 extra instructions.  Next, you
need to check the GOT for __guard (I'm assuming O(1), so one
instruction), and use MOV to copy __guard to local_guard (1 insn).

At return in a protected function, __guard must be checked with
CMP.  If it's fine, then JE past the code that calls
__stack_smash_handler().  Two more instructions.  This totals 4
for a __guard protection on a function.

To protect passed arguments, the stack frame has to be made bigger
(no overhead?).  A MOV based on an offset from an address in some
register has to be made for each argument (1 insn for each argument).

I'm guessing at internals, but I think most of this is possible.  I'm
not sure about the O(1) GOT lookup.  I'm probably wrong there.

> PAX uses tricks to get a non executable stack, and assignes random 
> addresses to PIE executables, which Fedora already has in the form of
> Exec Shield, good! But if I undertand it correctly PAX does more for 
> example also make data pages non executable, this might be something 
> worth looking into.

PaX makes a strict separation between Writable and Executable memory.
It also has more accurate NX emulation on x86.  Ingo has admitted that
PaX is competetive with Exec Shield in recent LKML posts:

(but no doubt PaX is fine and protects against exploits at least as
effectively as (and in some cases more effectively than) exec-shield,
so you've definitely not made a bad choice.)

It's also older and still actively developed (older and inactive is
bad, older and active is good, younger and active is not quite as good
unless your developer is quite a lot more competant on the subject).

SEGMEXEC on x86 splits the address space in half to emulate an NX bit;
I've never seen this cause a problem in anything.  PAGEEXEC used to
use kernel-assisted MMU walking, which can be very high overhead
depending on memory access patterns; now it uses the same method
Exec Shield uses, but falls back to kernel-assisted MMU walking if
that fails (due to mprotect()ing a higher address with PROT_EXEC).

> 
> PIE we already have, good!
> 

Feh, your PIE is a joke.  My ENTIRE SYSTEM is PIE, save for what
won't compile PIE. I don't use Fedora, but last I heard, only a
few programs were PIE.

The overhead of PIC (based on nbyte-bench) is something like
0.99002% on x86, and 0.02% on amd64.  There's another caveat:
-fomit-frame-pointer usually gives a -5% overhead (i.e. it removes
overhead and thus programs use less CPU).  This is lost on x86; no
effect (ok, 0.01%) on amd64.

That being said, you have to understand that libraries are ALL PIC.
You lose NOTHING in libraries by going to PIE.  All plug-ins to
anything (except gimp?), all encoder and decoder libraries,
libtheora, libogg, libvorbis, libvorbisenc, liblame, libmad,
libzlib and libbzip2, ALL YOUR HEAVY LIFTING is done in libraries.

I haven't profiled it but I'm fairly sure that any large amounts of
CPU are typically spent in PIC code anyway.  I'm pretty certain
that the real-world system-wide impact of PIE is dismally low due
to the low amount of time the actual executable load module spends
on the CPU versus the libraries it uses.

> Regards,
> 
> Hans
>