compiler bug turning up in cmake package?

Matthew Woehlke mw_triad at users.sourceforge.net
Tue Aug 26 04:28:00 UTC 2008


I'm getting this SEGV trying to install kdelibs on my machine (koji 
package 2.6.1-1.fc10.i386):

#0  cmELF::GetRPath (this=0xbf986708) at 
/usr/src/debug/cmake-2.6.1/Source/cmELF.cxx:787
#1  0x080e9c07 in cmSystemTools::CheckRPath (file=@0xbf986800, 
newRPath=@0xbf9867fc)
     at /usr/src/debug/cmake-2.6.1/Source/cmSystemTools.cxx:2617
#2  0x08134378 in cmFileCommand::HandleRPathCheckCommand 
(this=0x9ad3db8, args=@0xbf986874)
     at /usr/src/debug/cmake-2.6.1/Source/cmFileCommand.cxx:1557
#3  0x0815dd6a in cmFileCommand::InitialPass (this=0x9ad3db8, 
args=@0xbf986874)
     at /usr/src/debug/cmake-2.6.1/Source/cmFileCommand.cxx:121
#4  0x081620cc in cmCommand::InvokeInitialPass (this=0x9ad3db8, 
args=@0x9ad3fd4, status=@0xbf986918)
     at /usr/src/debug/cmake-2.6.1/Source/cmCommand.h:68
#5  0x080c65a8 in cmMakefile::ExecuteCommand (this=0x9ac2730, 
lff=@0x9ad3fc8, status=@0xbf986918)
     at /usr/src/debug/cmake-2.6.1/Source/cmMakefile.cxx:399
#6  0x08150baa in cmIfFunctionBlocker::IsFunctionBlocked 
(this=0x9ad1170, lff=@0x9ad7f98, mf=@0x9ac2730,
     inStatus=@0xbf9869f8) at 
/usr/src/debug/cmake-2.6.1/Source/cmIfCommand.cxx:116
#7  0x080b95dc in cmMakefile::IsFunctionBlocked (this=0x9ac2730, 
lff=@0x9ad7f98, status=@0xbf9869f8)
     at /usr/src/debug/cmake-2.6.1/Source/cmMakefile.cxx:2303

The relevant code is pretty boring:

785     bool cmELF::Valid() const
786     {
787       return this->Internal && this->Internal->GetFileType() != 
FileTypeInvalid;
788     }

...but the disassembly is unnerving:

0x819d4c0 <_ZN5cmELF8GetRPathEv>:       push   %ebp 

0x819d4c1 <_ZN5cmELF8GetRPathEv+1>:     mov    %esp,%ebp 

0x819d4c3 <_ZN5cmELF8GetRPathEv+3>:     sub    $0x8,%esp 

0x819d4c6 <_ZN5cmELF8GetRPathEv+6>:     mov    0x8(%ebp),%eax
0x819d4c9 <_ZN5cmELF8GetRPathEv+9>:     mov    (%eax),%edx
0x819d4cb <_ZN5cmELF8GetRPathEv+11>:    test   %edx,%edx
0x819d4cd <_ZN5cmELF8GetRPathEv+13>:    je     0x819d510 
<_ZN5cmELF8GetRPathEv+80>
0x819d4cf <_ZN5cmELF8GetRPathEv+15>:    mov    0x10(%edx),%eax
0x819d4d2 <_ZN5cmELF8GetRPathEv+18>:    test   %eax,%eax
0x819d4d4 <_ZN5cmELF8GetRPathEv+20>:    je     0x819d500 
<_ZN5cmELF8GetRPathEv+64>
0x819d4d6 <_ZN5cmELF8GetRPathEv+22>:    cmp    $0x2,%eax
0x819d4d9 <_ZN5cmELF8GetRPathEv+25>:    je     0x819d4e2 
<_ZN5cmELF8GetRPathEv+34>
0x819d4db <_ZN5cmELF8GetRPathEv+27>:    cmp    $0x3,%eax
0x819d4de <_ZN5cmELF8GetRPathEv+30>:    xchg   %ax,%ax
0x819d4e0 <_ZN5cmELF8GetRPathEv+32>:    jne    0x819d500 
<_ZN5cmELF8GetRPathEv+64>
0x819d4e2 <_ZN5cmELF8GetRPathEv+34>:    mov    (%edx),%eax
0x819d4e4 <_ZN5cmELF8GetRPathEv+36>:    movl   $0xf,0x4(%esp)
0x819d4ec <_ZN5cmELF8GetRPathEv+44>:    mov    %edx,(%esp)
0x819d4ef <_ZN5cmELF8GetRPathEv+47>:    call   *0x14(%eax)
0x819d4f2 <_ZN5cmELF8GetRPathEv+50>:    leave
0x819d4f3 <_ZN5cmELF8GetRPathEv+51>:    nop
0x819d4f4 <_ZN5cmELF8GetRPathEv+52>:    lea    0x0(%esi,%eiz,1),%esi
0x819d4f8 <_ZN5cmELF8GetRPathEv+56>:    ret
0x819d4f9 <_ZN5cmELF8GetRPathEv+57>:    lea    0x0(%esi,%eiz,1),%esi
0x819d500 <_ZN5cmELF8GetRPathEv+64>:    xor    %eax,%eax
0x819d502 <_ZN5cmELF8GetRPathEv+66>:    leave
0x819d503 <_ZN5cmELF8GetRPathEv+67>:    nop
0x819d504 <_ZN5cmELF8GetRPathEv+68>:    lea    0x0(%esi,%eiz,1),%esi
0x819d508 <_ZN5cmELF8GetRPathEv+72>:    ret
0x819d509 <_ZN5cmELF8GetRPathEv+73>:    lea    0x0(%esi,%eiz,1),%esi
0x819d510 <_ZN5cmELF8GetRPathEv+80>:    mov    0x10(%edx),%eax
0x819d513 <_ZN5cmELF8GetRPathEv+83>:    nop
0x819d514 <_ZN5cmELF8GetRPathEv+84>:    lea    0x0(%esi,%eiz,1),%esi
0x819d518 <_ZN5cmELF8GetRPathEv+88>:    jmp    0x819d4db 
<_ZN5cmELF8GetRPathEv+27>

Look particularly at the test at +11 and jump at +13, and then at lines 
+80 and +15. If I read this right, it tests if "this->Internal" is NULL, 
and then *dereferences it either way*. This is clearly not what the 
source listing says (and is clearly wrong), so I wonder where this 
generated code came from.

Hmm, actually, staring it it, trying to figure out how to hot-hack it so 
the install will finish, it looks like the jump address is wrong (should 
be going to +64, not +80). Or else, something funny is happening w.r.t. 
"Internal"s vtable.

Note that "this" looks like:

(gdb) p *this 

$3 = {Internal = 0x0, ErrorMessage = {static npos = 4294967295, 

     _M_dataplus = {<std::allocator<char>> = 
{<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data 
fields>}, _M_p = 0xa0b402c "Error reading ELF identification."}}} 


(edx is indeed 0x0, so that's definitely why it SEGV'd. And eax==this, 
so I don't think I'm too far in the bushes guessing what went wrong.)

Note also that I first spotted this in 2.6.0-1.fc10.i386; my .rpm which 
I kept around is timestamped 2008-05-06 (it also SEGV'd, but I upgraded 
to 2.6.1 before digging into it, so I can't absolutely confirm the same 
bug).

Because it looks like the generated code is bad, I'm inclined to blame 
this first on the distro (Fedora) but I'm also CC'ing the cmake folk 
(though I guess really I should be blaming either gcc, or come to think 
of it, possibly gas), and also gcc-help as I figure they're most likely 
to be able to make sense of the disassembly. Can anyone shed some light 
on this?

I'll likely try to debug this further (please feel free to request 
additional information), but for now it's a head's up of a bug in the 
Fedora package.

(To the gcc folk: I know it's not a STC*, or even full code; sorry for 
that, though in my experience compiler bugs like this disappear as soon 
as the code is touched in the slightest manner, plus I don't have direct 
access to the machine this was built on anyway. Hopefully I'll be able 
to work with the Fedora people on that if it's needed. What I'm mainly 
looking for from y'all is a second opinion if the assembly is clearly 
whacked, or if there is an obvious flaw in my analysis.)

(*Simple Test Case)

-- 
Matthew
ENOWIT: .sig file for this machine not set up yet




More information about the fedora-devel-list mailing list