OT: Requesting C advice

Matthew Saltzman mjs at clemson.edu
Fri Jun 1 23:02:15 UTC 2007


On Fri, 1 Jun 2007, Les wrote:

> On Fri, 2007-06-01 at 07:36 -0400, Matthew Saltzman wrote:
>>
>>> I know why their programs failed.  I also know that C uses a pushdown
>>                                                       ^some particular
>>                                                        implementations of
>>> stack for variables in subroutines.  You can check it out with a very
>>> simple program using pointers:
>>>
>>>    #include <sttlib.h>
>>>
>>>    int i,j,k;
>>>
>>>    main()
>>>    {
>>>        int mi,mj,mk;
>>>        int *x;
>>>        mi=4;mj=5;mk=6;
>>>        x=&mk;
>>>        printf ("%d  %d  %d\n",*x++,*X++;*X++);
>>>        x=&i;
>>>        printf ("%d  %d  %d\n",*x++,*x++,*x++);
>>>        i-1;j=2;k=3;
>>>        printf ("%d  %d  %d\n",*x++,*x++,*x++);
>>>  )
>>>
>>> Just an exercise you understand.  compile and run this with several c
>>> packages, or if the package you choose supports it, have it compile K&R.
>>> and try it.
>>
>> Of course, several constructs here are undefined, so there is no such
>> thing as "correct" or "incorrect" behavior.
>>
>> After correcting obvious typos and adding #include <stdio.h> so it would
>> compile, I got (using gcc-4.1.1-51.fc6 with no options):
>>
>>      $ ./a.out
>>      5  4  6
>>      0  0  0
>>      0  0  0
>
> OOPS, forgot to reset the X pointer between the last two print
> statements.  This bit of code is intended to show that globals are on a
> heap and locals are on a stack.

Fixed that.  Now I get:

$ ./a.out
5  4  6
0  0  0
0  2  1

But I confess, I don't see how this code proves your point.  It does 
demonstrate that globals are initialized by default, though.

>
>>
>> Was that what you were expecting?
>>
>>
>>>
>>> I cannot vouch for every compiler, only Microsoft, Sun, and Instant C
>>> off the top of my head.  I have used a few other packages as well.  But
>>> any really good programmer NEVER relies on system initialization.  It is
>>> destined to fail you at bad times.
>>
>> How much effort are you willing to expend to defend against potentially
>> buggy compilers (as opposed to undefined or implementation-defined
>> behaviors)?  The Intel fdiv bug would seem to prove that you should NEVER
>> rely on arithmetic instructions to provide the correct answer.  There's an
>> economic tradeoff between protecting yourself from all conceivable errors
>> and actually getting work done.
>>
>
> There is a difference between implementation differences and hardware
> errors, which was the microsoft error.  They had
> a bug in their silicon compiler that caused that IIRC.

I could just as easily reference some other obscure compiler bug or 
implementation-defined behavior and make the same point.  The thing about 
a standard is that there are clear requirements about what is 
implementation-defined and what is not.  Static initialization in ISO C is 
not one of those implementation-defined things.

I will concede that explicit initializations--even to default 
values--might be a useful self-documentation tool.

>
>>>                                     One case is as has been pointed out
>>> here, that NULL is sometimes 0, sometimes 0x80000000, and sometimes
>>> 0xffffffff.  Even NULL as a char may be 0xFF 0xFF00 or 0x8000 depending
>>> on the implementation.  But strings always end in a character NULL or
>>> 0x00 for 8 bit ascii, if you use GNU, Microsoft, or Sun C compilers.
>>> They may do otherwise on some others.  It can byte (;-) you if you are
>>> not careful.
>>
>> In your source code, NULL is *always* written 0 (or sometimes (void *) 0
>> to indicate that it's intented to stand for a null pointer value, not a
>> NUL character value).  The string terminator character is *always* written
>> '\0'.  The machine's representation of that value is immaterial.  If you
>> type-pun to try to look at the actual machine's representation, your
>> program's behavior is undefined and you deserve what you get.  It's the
>> compiler's responsibility to ensure that things work as expected, no
>> matter what the machine's representation is.  (For example, '\0' == 0 must
>> return 1.)
>>
>
> '\0' is an escape forcing the 0, so of course this will be equal.

OK.  But the main point is that it doesn't matter what bit pattern 
represents a null pointer.  Your source code will always use the value 0 
to represent it.  For example,

 	int *p;
 	/* ...code that sets p... */
 	if ( p == 0 ) /* *not*  if ( p == 0x80000000 ) or
 				if ( p == 0xffffffff ) */
 	{ /* ...handle null pointer value... */ }

>
>>>
>>>    And since that is so, how are those variables initialized? and to
>>> what value?  What is a pointer set to when it is intialized.  Hint, on
>>> Cyber the supposed default for assigned pointers used to the the address
>>> of the pointer.  Again, system dependencies may get you.
>>
>> Pre-ANSI/ISO compilers might have initialized static memory to
>> all-bits-zero even when that was not the correct representation of the
>> default for the type being initialized.  ANSI/ISO compilers are not
>> allowed to do that.  The required default initializations are well
>> defined.  (This is the sort of thing that motivates the creation of
>> standards in the first place.)
>>
>>>
>>>    And those systems that used the first location to store the return
>>> address are not re-entrant, without other supporting code in the
>>> background.  I think I used one of those once as well.
>>
>> There's no requirement for re-entrancy in K&R or ANSI/ISO.  In fact
>> several standard library routines are known to not be re-entrant.
>>
>
> This is true, but knowing that the base code is not reentrant due to
> design constraints or due to hardware constraints makes the difference
> on modern multithreaded systems, where the same executable memory can be
> used for the program (if the hardware allows that).

Sure, you need to know that you can compile re-entrant code if you need 
it.

>
>>>
>>>    PS.  A stack doesn't necessarily mean a processor call and return
>>> stack.  It is any mechanism of memory address where the data is applied
>>> to the current location, then the pointer incremented (or decremented
>>> depending on the architecture).
>>
>> But usually in the context of discussions about compiler architectures,
>> call stacks are exactly what is meant.
>>
>
> I am not sure that is true, because in some implementations, the data
> heap and stack are in the same segment of memory, while the runtime
> stack for the processor is somewhere else.  For high security systems
> running  this should be a requirement.  It prevents obvious means of
> inserting malicious code through variable initialization, and then stack
> manipulation.  I say should be, because it has been tossed around from
> time to time, but I am unsure if it has ever been formalized.
>
> One system I worked on looked like this:
>    init jump
>    heap
>    variable stack (push down)
>    program entrance
>    program
>    local libraries
>    relocation table
>    symbol table (if not removed)
>    machine stack
>
>    Unfortunately I no longer remember which system that was.  Just the
> fact that some standard libraries at that time would not run on it
> because they did manipulate the stack.
>
> Regards,
> Les H
>

-- 
 		Matthew Saltzman

Clemson University Math Sciences
mjs AT clemson DOT edu
http://www.math.clemson.edu/~mjs




More information about the fedora-list mailing list