Thursday, December 12, 2013

The 'cd' command

As everyone knows, the 'cd' command in Linux changes the current working directory to new directory. Really?! To be precise it changes the current working directory of the process in interest to new working directory. It cannot change 'cwd' of other processes. So whats the big deal?

Internally 'cd' command uses chdir system call to change current working directory. As mentioned before, 'chdir' can only change 'cwd' of process which is calling it not any other process. So what? :-). Now think of bash which is executing 'cd' command. How does it change its 'cwd' to new directory? Usually other commands like 'ls', 'dir' etc.. are executed with fork+exec combination i.e. by spawning a new process. Now in case of 'cd' you cannot spawn new process since new process cannot change 'cwd' of bash.

Aha! there is interesting part. Now how do you work it around? What does bash do? Bash does this by embedding the implementation of 'cd' in its own executable i.e. 'cd' is a command in bash itself rather than being stand-alone executables like 'ls' or 'dir'. Bash implements 'cd' in itself and exposes it as command in terminal. The user still interprets it as stand alone command because of this bash trick ;-). That means along with other commands enumerated by bash (using PATH variable), it also inserts 'cd' into the pool. Since 'cd' is now part of bash process, the changing to new working directory is straight forward :-). You can check your bin directory if any executable with name 'cd' could be found like one below ;-)

nandakumar@heramba ~ $ which ls || echo -e "get lost :-)"
/bin/ls
nandakumar@heramba ~ $ which cd || echo -e "get lost :-)"
get lost :-)

Even I was not aware of this fact until I recently read System Programming Book by Robert Love. The beauty of book is how Mr.Robert Love presents such minute things so accurately. At the end of day, there was a happy learner!

Thursday, December 5, 2013

Thread local storage with gcc __thread keyword

gcc provides __thread keyword to make a global variable (or in general data segment variable) local to thread.

This may be required when you use want to use thread safe/specific data within your code.

Consider an example:

#include <stdio.h>
#include <pthread.h>

void iterate()
{
        static int i=0;
        i++;
        printf("Thread id: %x, i=%d\n", pthread_self(), i);
}

void* thread_func (void* data)
{
        iterate();
}

int main()
{
        pthread_t tid[5];
        int i=0;

        for (i=0; i<5; i++)
                pthread_create(&tid[i], NULL, thread_func, NULL);

        for (i=0; i<5; i++)
                pthread_join(tid[i], NULL);
}


Here is output:

Thread id: 6ebcc700, i=1
Thread id: 6e3cb700, i=2
Thread id: 6d3c9700, i=4
Thread id: 6cbc8700, i=5
Thread id: 6dbca700, i=3


Static will be part of data segment which is not thread safe. We can make it thread safe by adding __thread keyword. Here is modified snippet of program and rest all things remain same.

<snip>

void iterate()
{
        static __thread int i=0;
        i++;
        printf("Thread id: %x, i=%d\n", pthread_self(), i);
}

<snip>


And the output:

Thread id: 31339700, i=1
Thread id: 2f335700, i=1
Thread id: 30b38700, i=1
Thread id: 30337700, i=1
Thread id: 2fb36700, i=1

As far as I know, __thread keyword can only be used with POC types (Plain old C types) but not hybrid or pointer types (citation needed). In that case, next statement provides an answer to achieve it. Also there is obvious overhead using __thread keyword since it requires some internal manipulation to get the data of particular thread of interest. (A simple dig into the assembly code will reveal IMHO)

There are also pthread_getspecific() and pthread_setspecific() APIs POSIX provides for TLS. Will try to experiment on the same in future.

Wednesday, July 10, 2013

Typechecking with gcc typeof

Some more fun with gcc! More and more I look into linux kernel source, more I explore the new things about gcc. As far as I know, most of gcc specific implementations are driven by linux kernel community which is replicated by other proprietary compiler writers and finally becomes ISO C standard :-). That is power of open source!

Here is a macro defined in one of linux kernel source header. We know that strict type checking is standard by itself in C++. However, since most of C clients are driver writers, it is still maintained as lightweight. But you can approximate the typecheck with following macro if required. This generates warning by default if the types do not match and can be transformed to error with -Werror switch too! One more new gcc specific implementation I learnt was the compound statement within parenthesis.

#define typecheck(type,x) \
({  type __dummy; \
    typeof(x) __dummy2; \
    (void)(&__dummy == &__dummy2); \
    1; \
})


Let me dig into the macro.

The first line is normal declaration.
Second line gets datatype of 'x'
Third line compares the address of the pointer types.
Fourth line is the final value of compound statement.

There are intentions behind lines #3 and #4. The line 3 is just for generating warning of invalid comparison of types and not meant for any logical matching. The 'void' signifies to ignore the comparison value. The final '1' signifies always success. The value evaluated from compound statement is the final value evaluated in compound statement. If none is present, the evaluation is treated as void. The above macro is just for comparing types. This is reason why the evaluation of pointer comparison is masked off and final expression is *always* made to return true value.

Here is slightly different macro and associated example. I have used it to typecheck pointer types. This may be handy while checking for void* arguments in functions.

#define typecheck(type,x) \
({  type *__dummy=NULL; \
    typeof(x) __dummy2; \
    (void)(__dummy == __dummy2); \
    1; \
})

struct test_struct {
        int k;
};

int main()
{
        int y = 10;
        typecheck(struct test_struct*, &y);

}


Here is sample output from gcc.

nandakumar@HERAMBA ~/codes $ gcc test.c
test.c: In function ‘main’:
test.c:15:2: warning: comparison of distinct pointer types lacks a cast [enabled by default]


If you want permanent error here is output from gcc with -Werror switch

nandakumar@HERAMBA ~/codes $ gcc -Werror test.c
test.c: In function ‘main’:
test.c:15:2: error: comparison of distinct pointer types lacks a cast [-Werror]
cc1: all warnings being treated as errors


This macro is really handy and can be replaced with compiler specific implementations as a one shot change. Note that macro implementation may not be same across all compilers.

Extra Notes:

1) The typeof operator and compound statement within parenthesis seems to be gcc and GNU C specific.

2) There is always a great thing about the compound statement within parenthesis. The variable names used inside the statement may also clash with the names being used within main code block which is not possible with plain macros. For example, the below code still works fine!

#define typecheck(type,x) \
({  type *__dummy=NULL; \
    typeof(x) __dummy2; \
    (void)(__dummy == __dummy2); \
    1; \
})

struct test_struct {
        int k;
};

int main()
{
        int __dummy = 10;
        typecheck(struct test_struct*, &__dummy);

}

And here is output!

nandakumar@HERAMBA ~/codes $ gcc test.c
test.c: In function ‘main’:
test.c:15:2: warning: comparison of distinct pointer types lacks a cast [enabled by default]


References:

1) typeof: http://gcc.gnu.org/onlinedocs/gcc/Typeof.html
2) Compound statements: http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs

Sunday, June 16, 2013

Branch optimization in gcc


With -O2 option, gcc by default does some of branch optimizations. This one I recently discovered trying to understand __builtin_expect() gcc buitlin.

Consider following example:

#include <stdio.h>
#include <stdlib.h>

int main()
{
        char *c = malloc(10);
        if (c ==NULL) {
                printf("MALLOC FAILED\n");
                return 1;
        }
        free(c);
        printf("MALLOC SUCCEEDED\n");
}

If I compile as 'gcc -S test.c' and 'gcc -O2 -S test.c' and compare the assembly here is the output. Now what does that mean? In the optimized version, the malloc() is considered to be error free for most of cases (likely condition) by gcc and the part of code is moved near the text segment. The failure part is put in the sequel. How come gcc is doing such kind of optimization? Does this do based upon memory footprint of system? If so, how about cross compiler situation? I have no answer. Is this optimization based on exit point (since our code is small, gcc may be predicting things after malloc() are really less and hence nothing to worry on optimization since all code may be within cacheline size). My assumption was such kind of optimization is only left to user based on __builtin_expect() directive even if -O2 switch is used.

Look at assembly output difference

1) Optimized one

.section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "MALLOC FAILED"
.LC1:
        .string "MALLOC SUCCEEDED"
        .section        .text.startup,"ax",@progbits
        .p2align 4,,15
        .globl  main
        .type   main, @function
main:
.LFB34:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movl    $10, %edi
        call    malloc
        testq   %rax, %rax
        je      .L4
        movq    %rax, %rdi
        call    free
        movl    $.LC1, %edi
        addq    $8, %rsp
        .cfi_remember_state
        .cfi_def_cfa_offset 8
        jmp     puts

.L4:
        .cfi_restore_state
        movl    $.LC0, %edi
        call    puts
        movl    $1, %eax
        popq    %rdx
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE34:
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2"
        .section        .note.GNU-stack,"",@progbits


2) Non-Optimized

.section        .rodata
.LC0:
        .string "MALLOC FAILED"
.LC1:
        .string "MALLOC SUCCEEDED"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movl    $10, %edi
        call    malloc
        movq    %rax, -8(%rbp)
        cmpq    $0, -8(%rbp)
        jne     .L2
        movl    $.LC0, %edi
        call    puts
        movl    $1, %eax
        jmp     .L1

.L2:
        movq    -8(%rbp), %rax
        movq    %rax, %rdi
        call    free
        movl    $.LC1, %edi
        call    puts
.L1:
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc

.LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2"
        .section        .note.GNU-stack,"",@progbits



If you look at optimized assembly code

testq    %rax, %rax
je    .L4

which is callng the failure one. However right after 'je .L4', we see success code

movq    %rax, %rdi
call    free
movl    $.LC1, %edi
addq $8, %rsp       .cfi_remember_state                                                                                                                                            
.cfi_def_cfa_offset8                                                                                                                                          
jmp    puts


That means gcc has optimized the branch prediction. So we do not have to use __builtin_expect() for such cases ;-). Not sure if this is applicable for all sort branching or only branches with less code.

The non-optimized part takes normal course.

subq    $16, %rsp
movl    $10, %edi
call    malloc
movq    %rax, -8(%rbp)

cmpq    $0, -8(%rbp)
jne    .L2   // call usual malloc
movl    $.LC0, %edi //failure
call    puts
movl    $1, %eax
jmp    .L1


Let me know if you have alternate thoughts. The code is compiled under Linux mint x86_64 architecture. Not sure if same behavior could be seen across all archs.

Sunday, May 5, 2013

Format specifiers for definitive types

In one of my previous post, I mentioned about an ugly #ifdef workaround for printing definitive types like uint64_t. However this is not at all required :-). There is already format specifiers for these types in inttypes.h! It contains ARCH independent format specifiers for many definitive types like uint64_t, uint32_t, uint16_t etc.. and their signed counterparts too!

A quick glance around the header file is below! It is located in "/usr/include/inttypes.h"

/* Unsigned decimal notation.  */
# define SCNu8          "hhu"
# define SCNu16         "hu"
# define SCNu32         "u"
# define SCNu64         __PRI64_PREFIX "u"


and __PRI64_PREFIX is defined as

# if __WORDSIZE == 64
#  define __PRI64_PREFIX        "l"
#  define __PRIPTR_PREFIX       "l"
# else
#  define __PRI64_PREFIX        "ll"
#  define __PRIPTR_PREFIX
# endif



There are several such macros for many types and variants (hex, octal etc..). Glance through the header if you are interested further!

A quick short program.

#include <stdio.h>
#include <inttypes.h>

int main()
{
        uint64_t test = 0x12345678;
        printf("%"SCNu64" : %"SCNx64"\n", test, test);
}


Output: 305419896 : 12345678

and no warnings too ;-)