Again it is not something new :). Only thing is that I observed it recently. We can initialize a char* pointer with a constant string even after declaration. The restriction is of-course you cannot modify the content of pointer.
So what is the point. gcc puts the constant strings under read only section of data segment. Consider the example:
int main()
{
char* str;
str = "It is me";
printf("%s\n", str);
}
If you think above code results in segmentation fault, nope. It works fine. So how does output look like
nandakumar@HERAMBA:~$ gcc test.c -o test
nandakumar@HERAMBA:~$ ./test
It is me
So how does gcc manage it. Here is snipped assembly output.
nandakumar@HERAMBA:~$ gcc -S test.c
nandakumar@HERAMBA:~$ head -5 test.s
.file "test.c"
.section .rodata
.LC0:
.string "It is me"
.text
<snip>
You can even do this. It works great
int main()
{
char* str;
str = "It is me";
printf("%s\n", str);
str = "It is me again :-)";
printf("%s\n", str);
}
Here is output:
nandakumar@HERAMBA:~$ gcc test.c -o test
nandakumar@HERAMBA:~$ ./test
It is me
It is me again :-)
So the data gets stored in read-only section of text segment and the pointer str points to the location in rodata. In second example you have two entries in rodata section.
nandakumar@HERAMBA:~$ gcc -S test.c
nandakumar@HERAMBA:~$ head -6 test.s
.file "test.c"
.section .rodata
.LC0:
.string "It is me"
.LC1:
.string "It is me again :-)"
Can I modify. Not at all. How can you modify rodata :-). It even does make sense the compiler puts it in read only section of text segment since we do not want to mess up with memory. Here is sample
int main()
{
char* str;
str = "It is me";
printf("%s\n", str);
str = "It is me again :-)";
printf("%s\n", str);
str[0] = 'P';
printf("%s\n", str); /* Get LOST!! */
}
nandakumar@HERAMBA:~$ gcc test.c -o test
nandakumar@HERAMBA:~$ ./test
It is me
It is me again :-)
Segmentation fault
Note that having more constant strings bloats the executable size. Also be careful not to modify the contents. How can you make more precise about rodata? Consider this case
int main()
{
char* str;
str = "It is me";
printf("str address is %p and content \"%s\"\n", str, str);
}
The output:
nandakumar@HERAMBA:~$ gcc test.c -o test
nandakumar@HERAMBA:~$ ./test
str address is 0x80484f0 and content "It is me"
The snipped version of objdump
nandakumar@HERAMBA:~$ objdump -D test
<snip>
Disassembly of section .rodata:
080484e8 <_fp_hw>:
80484e8: 03 00 add (%eax),%eax
...
080484ec <_IO_stdin_used>:
80484ec: 01 00 add %eax,(%eax)
80484ee: 02 00 add (%eax),%al
80484f0: 49 dec %ecx
80484f1: 74 20 je 8048513 <_IO_stdin_used+0x27>
80484f3: 69 73 20 6d 65 00 25 imul $0x2500656d,0x20(%ebx),%esi
80484fa: 70 2d jo 8048529 <_IO_stdin_used+0x3d>
80484fc: 25 .byte 0x25
80484fd: 73 0a jae 8048509 <_IO_stdin_used+0x1d>
...
<snip>
The virtual memory location 0x80484f0 points to "It is me" (ignore the disassembled output). Note that the VM address may change every time you run.
That is it. I know you are feeling such silly thing but new for me. Leave your suggestions too!
Note:
1) Skipped C headers in code.
2) All codes were tested in Linux box and gcc compiler.
3) You can even run Valgrind to check sanity of code.
Small Correction: Earlier it was mentioned as read only data segment, however it is read only section of text segment. My apologies for mistake :-). Corrections are made at appropriate locations of write-up.