Wednesday, June 23, 2021

Function pointers vs Normal calls

I was looking for mechanism to have zero overhead logging in my personal project. The first approach that struck my brain was creating a dummy function and calling it whenever logging is disabled. However I had many surprises while doing simple experiments. I am writing down here. Hope it helps you too while making decision.

Lets begin with problem. Here is usual way of implementing logging mechanism.

if (debug_flag_enabled & logger_mask)

    //emit my logging function

where logger_mask is represented as set of bits with enabled/disabled logs

There is catch here. Even if the log does not need to be emitted, there is always that "if" condition overhead for every call.

My approach was simple which was to sink the disabled logs to "do-nothing" logger function. The approach I took was to store function pointers in an array mapped to relevant logging function. Whenever the particular log level is disabled, the respective logging function would be mapped to "do-nothing" function which would ultimately be inlined by the compiler optimization (which turned out to be bad assumption).

Assuming that all logger functions are disabled, here is code with "do-nothing" approach

#include <stdio.h>

typedef void (*logging_func)();


#define DEBUG 0

#define INFO 1

#define WARN 2

#define ERR 3

#define CRIT 4

#define MAX (CRIT+1)


void null_logger()

{

    // do nothing

}

logging_func app_loggers[MAX] = {null_logger, null_logger, null_logger, null_logger, null_logger};

int main()

{

    int iteration = 1000;

    for (int i=0; i < iteration; i++)

    {

        app_loggers[DEBUG]();

        app_loggers[INFO]();

        app_loggers[WARN]();

        app_loggers[ERR]();

        app_loggers[CRIT]();

    }

}

Unfortunately, my assumptions were hit back by the surprises. I compared timing of the function pointers with normal calls and the timing results were startling :-D.

Here is code with "if" condition' approach.

int main()

{

    int iteration = 1000;

    for (int i=0; i < iteration; i++)

    {

        if (logger_mask & DEBUG)

            null_logger();


        if (logger_mask & INFO)

            null_logger();


        if (logger_mask & WARN)

            null_logger();


        if (logger_mask & ERR)

            null_logger();


        if (logger_mask & CRIT)

            null_logger();

    }

}

The timing observations for both approaches are as below for 1k iterations. First one is for function pointer approach while second is plain "if" condition approach

For 1k iterations

nanda@heramba:~/experiment/log_c_o1$ gcc -O2 logger_blog.c 

nanda@heramba:~/experiment/log_c_o1$ ./a.out 


===============

START = 2778 - 241274892

END   = 2778 - 241286144


===============


===============

START = 2778 - 241286213

END   = 2778 - 241286283


===============


With function pointer: 11252ns

With direct call: 70ns


For 10k iterations


nanda@heramba:~/experiment/log_c_o1$ gcc -O2 logger_blog.c 

nanda@heramba:~/experiment/log_c_o1$ ./a.out 


===============

START = 2850 - 781701427

END   = 2850 - 781759262


===============


===============

START = 2850 - 781759300

END   = 2850 - 781759326


===============

With function pointer: 57835ns

With direct call: 26ns


Do you see the vast difference of latency provided by function pointers? The one with "if" condition remains relatively constant for 10k iteration also but the assumed O(1) function pointer approach increased the latency :-)

Rather than improving, the situation aggravated. Both of the approach call the same null_logger. One via stored function pointers and other via direct calls with "if" condition. Note also that this code was compiled with "-O2"     

The function pointers apparently introduces lot of overhead in our code. As seen below from assembly, the function pointer has to be dereferenced and then called. Hence we have latency in this procedure. The latency becomes worst if we have multiple iterations of logging

         movq    app_loggers(%rip), %rdx

        movl    $0, %eax

        call    *%rdx

Also because I stored null_logger in array, the compiler did not optimize the null_logger function. When I removed the references of app_loggers[] array, the compiler did wonderful optimization. It was evident from generated assembly code which is stripped for brevity. There is no call to null_logger at all :-)

main:

.LFB24:

        .cfi_startproc

        leaq    time_store(%rip), %rbx

        leaq    64(%rbx), %r12

        call    clock_gettime@PLT

        leaq    48+time_store(%rip), %rsi

        movl    $1, %edi

        call    clock_gettime@PLT

Complete compliable code is in my github page:  https://github.com/nkumar85/experiments/tree/main/fn_ptr_if_direct


There are plenty of things to learn in basics which we fail to experiment. This experiment is one good example. It brings to mind a corollary "does the virtual functions in C++ also have similar overhead?"