Saturday, February 18, 2012

A note on char* initialization

Again it is not something new :). Only thing is that I observed it recently. We can initialize a char* pointer with a constant string even after declaration. The restriction is of-course you cannot modify the content of pointer.

So what is the point. gcc puts the constant strings under read only section of data segment. Consider the example:

int main()
{
        char* str;
        str = "It is me";
        printf("%s\n", str);
}


If you think above code results in segmentation fault, nope. It works fine. So how does output look like

nandakumar@HERAMBA:~$ gcc test.c -o test
nandakumar@HERAMBA:~$ ./test
It is me

So how does gcc manage it. Here is snipped assembly output.

nandakumar@HERAMBA:~$ gcc -S test.c
nandakumar@HERAMBA:~$ head -5 test.s
    .file    "test.c"
    .section    .rodata
.LC0:
    .string    "It is me"
    .text

      
<snip>

You can even do this. It works great

int main()
{
        char* str;
        str = "It is me";
        printf("%s\n", str);
        str = "It is me again :-)";
        printf("%s\n", str);
}


Here is output:

nandakumar@HERAMBA:~$ gcc test.c -o test
nandakumar@HERAMBA:~$ ./test
It is me
It is me again :-)


So the data gets stored in read-only section of text segment and the pointer str points to the location in rodata. In second example you have two entries in rodata section.

nandakumar@HERAMBA:~$ gcc -S test.c
nandakumar@HERAMBA:~$ head -6 test.s
    .file    "test.c"
    .section    .rodata
.LC0:
    .string    "It is me"
.LC1:
    .string    "It is me again :-)"


Can I modify. Not at all. How can you modify rodata :-). It even does make sense the compiler puts it in read only section of text segment since we do not want to mess up with memory. Here is sample

int main()
{
        char* str;
        str = "It is me";
        printf("%s\n", str);
        str = "It is me again :-)";
        printf("%s\n", str);

        str[0] = 'P';
        printf("%s\n", str); /* Get LOST!! */
}


nandakumar@HERAMBA:~$ gcc test.c -o test
nandakumar@HERAMBA:~$ ./test
It is me
It is me again :-)
Segmentation fault


Note that having more constant strings bloats the executable size. Also be careful not to modify the contents. How can you make more precise about rodata? Consider this case

int main()
{
        char* str;
        str = "It is me";
        printf("str address is %p and content \"%s\"\n", str, str);
}


The output:

nandakumar@HERAMBA:~$ gcc test.c -o test
nandakumar@HERAMBA:~$ ./test
str address is 0x80484f0 and content "It is me"

The snipped version of objdump

nandakumar@HERAMBA:~$ objdump -D test

<snip>

Disassembly of section .rodata:

080484e8 <_fp_hw>:
 80484e8:    03 00                    add    (%eax),%eax
    ...

080484ec <_IO_stdin_used>:
 80484ec:    01 00                    add    %eax,(%eax)
 80484ee:    02 00                    add    (%eax),%al
 80484f0:    49                       dec    %ecx
 80484f1:    74 20                    je     8048513 <_IO_stdin_used+0x27>
 80484f3:    69 73 20 6d 65 00 25     imul   $0x2500656d,0x20(%ebx),%esi
 80484fa:    70 2d                    jo     8048529 <_IO_stdin_used+0x3d>
 80484fc:    25                       .byte 0x25
 80484fd:    73 0a                    jae    8048509 <_IO_stdin_used+0x1d>
    ... 


<snip>

The virtual memory location 0x80484f0 points to "It is me" (ignore the disassembled output). Note that the VM address may change every time you run.

That is it. I know you are feeling such silly thing but new for me. Leave your suggestions too!

Note:
1) Skipped C headers in code.
2) All codes were tested in Linux box and gcc compiler.
3) You can even run Valgrind to check sanity of code.

Small Correction: Earlier it was mentioned as read only data segment, however it is read only section of text segment. My apologies for mistake :-). Corrections are made at appropriate locations of write-up.