Wednesday, June 23, 2021

Function pointers vs Normal calls

I was looking for mechanism to have zero overhead logging in my personal project. The first approach that struck my brain was creating a dummy function and calling it whenever logging is disabled. However I had many surprises while doing simple experiments. I am writing down here. Hope it helps you too while making decision.

Lets begin with problem. Here is usual way of implementing logging mechanism.

if (debug_flag_enabled & logger_mask)

    //emit my logging function

where logger_mask is represented as set of bits with enabled/disabled logs

There is catch here. Even if the log does not need to be emitted, there is always that "if" condition overhead for every call.

My approach was simple which was to sink the disabled logs to "do-nothing" logger function. The approach I took was to store function pointers in an array mapped to relevant logging function. Whenever the particular log level is disabled, the respective logging function would be mapped to "do-nothing" function which would ultimately be inlined by the compiler optimization (which turned out to be bad assumption).

Assuming that all logger functions are disabled, here is code with "do-nothing" approach

#include <stdio.h>

typedef void (*logging_func)();


#define DEBUG 0

#define INFO 1

#define WARN 2

#define ERR 3

#define CRIT 4

#define MAX (CRIT+1)


void null_logger()

{

    // do nothing

}

logging_func app_loggers[MAX] = {null_logger, null_logger, null_logger, null_logger, null_logger};

int main()

{

    int iteration = 1000;

    for (int i=0; i < iteration; i++)

    {

        app_loggers[DEBUG]();

        app_loggers[INFO]();

        app_loggers[WARN]();

        app_loggers[ERR]();

        app_loggers[CRIT]();

    }

}

Unfortunately, my assumptions were hit back by the surprises. I compared timing of the function pointers with normal calls and the timing results were startling :-D.

Here is code with "if" condition' approach.

int main()

{

    int iteration = 1000;

    for (int i=0; i < iteration; i++)

    {

        if (logger_mask & DEBUG)

            null_logger();


        if (logger_mask & INFO)

            null_logger();


        if (logger_mask & WARN)

            null_logger();


        if (logger_mask & ERR)

            null_logger();


        if (logger_mask & CRIT)

            null_logger();

    }

}

The timing observations for both approaches are as below for 1k iterations. First one is for function pointer approach while second is plain "if" condition approach

For 1k iterations

nanda@heramba:~/experiment/log_c_o1$ gcc -O2 logger_blog.c 

nanda@heramba:~/experiment/log_c_o1$ ./a.out 


===============

START = 2778 - 241274892

END   = 2778 - 241286144


===============


===============

START = 2778 - 241286213

END   = 2778 - 241286283


===============


With function pointer: 11252ns

With direct call: 70ns


For 10k iterations


nanda@heramba:~/experiment/log_c_o1$ gcc -O2 logger_blog.c 

nanda@heramba:~/experiment/log_c_o1$ ./a.out 


===============

START = 2850 - 781701427

END   = 2850 - 781759262


===============


===============

START = 2850 - 781759300

END   = 2850 - 781759326


===============

With function pointer: 57835ns

With direct call: 26ns


Do you see the vast difference of latency provided by function pointers? The one with "if" condition remains relatively constant for 10k iteration also but the assumed O(1) function pointer approach increased the latency :-)

Rather than improving, the situation aggravated. Both of the approach call the same null_logger. One via stored function pointers and other via direct calls with "if" condition. Note also that this code was compiled with "-O2"     

The function pointers apparently introduces lot of overhead in our code. As seen below from assembly, the function pointer has to be dereferenced and then called. Hence we have latency in this procedure. The latency becomes worst if we have multiple iterations of logging

         movq    app_loggers(%rip), %rdx

        movl    $0, %eax

        call    *%rdx

Also because I stored null_logger in array, the compiler did not optimize the null_logger function. When I removed the references of app_loggers[] array, the compiler did wonderful optimization. It was evident from generated assembly code which is stripped for brevity. There is no call to null_logger at all :-)

main:

.LFB24:

        .cfi_startproc

        leaq    time_store(%rip), %rbx

        leaq    64(%rbx), %r12

        call    clock_gettime@PLT

        leaq    48+time_store(%rip), %rsi

        movl    $1, %edi

        call    clock_gettime@PLT

Complete compliable code is in my github page:  https://github.com/nkumar85/experiments/tree/main/fn_ptr_if_direct


There are plenty of things to learn in basics which we fail to experiment. This experiment is one good example. It brings to mind a corollary "does the virtual functions in C++ also have similar overhead?"

Thursday, August 8, 2019

free()ing an invalid pointer

One of my friend asked this question. What happens if you free() an invalid pointer location? i.e. the pointer which was not returned by malloc(). I thought for a while. Few years back when I skimmed through glibc malloc code, I remember it was some kind of lookup based on pointer. After a while, I replied saying it would be a memory leak.

But wait a minute, the free() call returns nothing. While the programmer is happy about his bad code, internally there is something baking bad without his notice. When customer reports a long run memory leak issue, then the furnace erupts and developer is under intense pressure. So there should be something else free() doing. So let's try a sample program

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char ptr = (char)(malloc(100));

    ptr++;

    free(ptr);
}

And the output is...

nanda@nanda-MS-7640:/media/nanda/PERSONAL/c_code/const$ ./a.out 
free(): invalid pointer
Aborted (core dumped)

Yes the ABORT is called by glibc and the actions are performed as per the SIGABRT behaviour of the calling process. This was a fantastic learning. Now let's look at the glibc code.

if you try to free invalid pointer, this is result

if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)
      || __builtin_expect (misaligned_chunk (p), 0))
    malloc_printerr ("free(): invalid pointer");

this guy calls abort

malloc_printerr (const char *str)
{
  __libc_message (do_abort, "%s\n", str);
  __builtin_unreachable ();
}

So what's the point here?

Look at the start of free() function : https://code.woboq.org/userspace/glibc/malloc/malloc.c.html#3092

In line 3109 it tries to get the metadata associated with the allocated memory

p = mem2chunk (mem);

In line 3131

ar_ptr = arena_for_chunk (p);
  _int_free (ar_ptr, p, 0);

The _int_free has this comment

/* Little security check which won't hurt performance: the
     allocator never wrapps around at the end of the address space.
     Therefore we can exclude some size values which might appear
     here by accident or by "design" from some intruder.  */

  if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)
      || __builtin_expect (misaligned_chunk (p), 0))
    malloc_printerr ("free(): invalid pointer");

Probably in our case, it was misaligned chunk. There is no way ptr in actual code would have wrapped around.


Ending Note:

This is specific to glibc implementation. The man pages do not mandate specific behaviour in such a case. It just says the pointer to free() should be the one returned by malloc.

More deep dive on malloc/free code in subsequent posts.

Sunday, May 19, 2019

How does is_same works?

Sometimes I turn  inquisitive towards the type traits of modern c++. A more dig into glibc unfolds the mystery behind the implementation. So how does is_same work then?

See the below code. I have shamelessly copied libstdc++ code :-) for learning myself.

Find it here: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/type_traits

#include <iostream>

template<typename _Tp, _Tp __v>
struct integral_constant
{
    static constexpr _Tp                  value = __v;
    typedef _Tp                           value_type;
    typedef integral_constant<_Tp, __v>   type;
    constexpr operator value_type() const noexcept { return value; }
};

/// The type used as a compile-time boolean with true value.
typedef integral_constant<bool, true>     true_type;

/// The type used as a compile-time boolean with false value.
typedef integral_constant<bool, false>    false_type;

/// is_same
template<typename, typename>
    struct is_same
    : public false_type {
        is_same() { std::cout << "False Type" << std::endl;}
    };

template<typename _Tp>
    struct is_same<_Tp, _Tp>
    : public true_type
{
        is_same() { std::cout << "True Type" << std::endl;}
};

int main()
{
        is_same<int,int> test1;
        is_same<int,unsigned char> test2;
        std::cout << is_same<int,int>::value << std::endl;
        std::cout << is_same<int,unsigned char>::value << std::endl;
}


The fact is that, is_same works on template specialization! Hurray! It has O(1) complexity. So how is that being achieved?

consider is_same<int,int>

As you know that modern c++ allows template specialization for particular cases. The same thing works here as well!

is_same<> with same datatype resolves to specialized template

template<typename _Tp>
    struct is_same<_Tp, _Tp>
    : public true_type
{
        is_same() { std::cout << "True Type" << std::endl;}
};

which is inherited from true_type which itself is a struct integral_constant having value true and type bool. Note the static constexpr value which is evaluated in compile time i.e. no runtime overhead is seen. Also the weird template structure contains first argument as typename and second as value. These are somethings which even I did not knew are possible to write in this fashion.

Needless to say, is_same<> evaluating on different types is quite trivial template substitution and tracing on similar basis results in value = false.

I was scratching my head why static is required for the first member of integral_constant. I feel its due to the fact that is_same is structure and its value is constant throughout. static makes that each copy of is_same is referencing to same copy in memory for 'value' member. The other members do not consume memory anyway!

Most importantly, the output of program:

True Type
False Type
1
0

Thursday, October 11, 2018

A trivial note on Vector.erase()

Recently, I encountered a problem with Vector.erase() function. Consider the below code.

#include <vector>
#include <iostream>

struct SpookyStruct
{
    int k = 10;
    std::string blank = "blank";

    void operator=(const SpookyStruct& rhs)
    {
        k = rhs.k;
        blank = blank;
    }
};

std::vector<SpookyStruct> spookyVector =
{
    {1, "This"},
    {2, "is"},
    {3, "Disaster"},
    {4, "You"},
    {5, "Know"}
};

int main(int argc, char** argv)
{
    for (auto entry = spookyVector.begin(); entry != spookyVector.end(); ++entry)
    {
        std::cout << "Address of iterator before deletion : " << &(*entry) << std::endl;
    }

    for (auto entry = spookyVector.begin(); entry != spookyVector.end();)
    {
        if (entry->k % 2)
        {
            ++entry;
            continue;
        }

        entry = spookyVector.erase(entry);
    }

    for (auto entry = spookyVector.begin(); entry != spookyVector.end(); ++entry)
    {
        std::cout << "Address of iterator after deletion : " << &(*entry) << std::endl;
    }

    for (const auto& oddOnly : spookyVector)
    {
        std::cout << oddOnly.k << std::endl;
        std::cout << oddOnly.blank << std::endl;
        std::cout << std::endl;
    }

    std::cout << spookyVector.capacity() << std::endl;
}

What should be expected output? Let's run the program

nanda@nanda-MS-7640:/media/nanda/PERSONAL/c14_code/vec_erase$ g++ -std=c++14 SpookyVector.cpp 
nanda@nanda-MS-7640:/media/nanda/PERSONAL/c14_code/vec_erase$ ./a.out 

1
This

3
is

5
Disaster

Yes, the result is a disaster. This is not the output we are expecting. So what's wrong with the code?

void operator=(const SpookyStruct& rhs)
 {
    k = rhs.k;
    blank = blank;
}

Yes! It is a trivial bug but sometimes hard to debug if you don't know what is happening inside erase function.

What is happening during erase?

You know that one of the overloaded erase deletes the data at the given position and returns the next valid location in vector. Basically the data is not deleted but invalidated! The vector.erase invalidates the current entry and copies the following entries to the top. In the process, the object's assignment overloaded operator invoked if present. Here is catch, we must be careful to write the accurate assignment overload. Or else, the results could be absurd and hard to debug fast. Our assignment implementation is partially tainted!

Lets look at the iterator address before and after deletion.

Address of iterator before deletion : 0x55586894be70
Address of iterator before deletion : 0x55586894be98
Address of iterator before deletion : 0x55586894bec0
Address of iterator before deletion : 0x55586894bee8
Address of iterator before deletion : 0x55586894bf10

Address of iterator after deletion : 0x55586894be70
Address of iterator after deletion : 0x55586894be98
Address of iterator after deletion : 0x55586894bec0

Yes it remains same because, the trailing elements are copied to one level up. This should be simple as

copyElements(deletedElemPtr, nextElemPtr, nextElemPtr + numofElementsFromNextElementPtr);
In process of copy, the assignment operator is invoked and our assignment operator has buggy logic. So it copies the "blank" from invalidated entry than the actual entry.
The Details!
Here 2nd element is deleted and 3rd element takes second position. Hence the string value of 3rd element overridden as "is". Similarly 4th element takes 3rd position whose string becomes "Disaster" while 5th becomes fourth with string value as "You". In next iteration, 3rd element is deleted and hence 4th moves to 3rd in which case, string value becomes "Disaster" again [both the output and for coder :-) ]

What's the capacity of vector at the end of exercise?

5

Yes, its still 5. The memory is not deleted assuming that space would be filled soon! This is optimization strategy employed by the library writers.

After correcting the assignment operator, we come to end of disaster and its time to recoup!

void operator=(const SpookyStruct& rhs)
{
    k = rhs.k;
    blank = rhs.blank;
}

Here is desired output:

1
This

3
Disaster

5
Know


Hope you learned something today!

PS:

Yeah, if you have copy assignment, you need to have copy constructor as well. That's bug in standard. But its not bug here. I just left it out for brevity.

Wednesday, September 19, 2018

The delete operator rant

I have seen many instances using below code (assuming older standard before nullptr).

if (myAllocatedObject != NULL)
{
    delete  myAllocatedObject;
}

This is absolutely ridiculous. C++ standard guarantees that delete on NULL pointer is harmless and has no effect. Of-course, this is true only if object was allocated using new. It would be bad coding too if you try to deallocate memory using delete which was not constructed by new. This in fact applies to every allocator/deallocator combination.

Suppose if you use your own allocator, then the deallocator also needs to be supplied. It will be absurd if you restrict only to allocator.

Conclusively, the redundant NULL check is worthless. Also the code looks downright ugly.

How does libstdc++ behave?

While peeking  source tree, you get from del_op.cc

_GLIBCXX_WEAK_DEFINITION void
operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT
{
     std::free(ptr);
}

And std::free leads to cstdlib.h which in turn includes glibc stdlib.h and then to

void
__libc_free (void *mem)


if (mem == 0)                              /* free(0) has no effect */
    return; 

Phew! We landed and also freed safely :-)

PS: I heard few static analyzers also complain if you don't wrap the delete with NULL check. I cannot vouch for this statement as of now since there was never an instance stumbled upon.

Saturday, April 14, 2018

C++ : Tampering with the private class variables

I have lot things to write in C++/OS/Network arena. I don't feel the urge to write unless clarity is gained over the subject. Thanks for waiting! Hopefully, the mute period will be broken and more technical topics in coming days.
From past 3 weeks, I have been intensely pondering on rvalue/lvalue semantics. To large extent I could comprehend as well. In the process of learning, I accidentally wrote below code. Hey! you cannot modify private in C++ but we can trick via pointers. If you argue that C++ have references, take a look at the below code.


#include <iostream>
#include <string>

class PrivateTest
{
    public:
        PrivateTest() = default;
        std::string& getMessage() { return message; }

    private:
        std::string message = "Nanda";
};

int main()
{
    PrivateTest test;

    std::cout << test.getMessage() << std::endl;
    std::string& corrupted = test.getMessage();
    corrupted = "gotcha";
    std::cout << test.getMessage() << std::endl;
}


Let's compile and execute!

nandakumar@heramba ~ $ g++ -std=c++14 PrivateTest.cpp
nandakumar@heramba ~ $ ./a.out
Nanda
gotcha


The private is exposed :-). Perhaps, this is an example of bad/vulnerable C++ class design. The encapsulation needs to be stronger so that private access violations are not broken. The code somewhat similar to pointer version replaced with reference. Ultimately, the assembly boils down to logical address. The jargon of pointers & references only fall under compiler paradigms and has less significance at assembly level.

This is not something new, but revelation to few coders happens slower ;-)

Monday, October 10, 2016

A strange note on bash aliases

Today I encountered a strange scenario (rather common for most of programmers) while adding function in my .bashrc file. For confidentiality purpose, I have slightly modified the use-case to present the scenario. Here is what I did

First, I added alias. But it did not suite my needs wherein I wanted to append folder name to the complete path.

alias my_func="echo /home/nandakumar/funny/"

What I achieved with this alias was

nandakumar@heramba:~$ source ~/.bashrc
nandakumar@heramba:~$ my_func nanda
/home/nandakumar/funny/ nanda


My expectation was to print /home/nandakumar/funny/nanda i.e. without added space. However, aliases do not have such capability.

After scouting the digital knowledge book (aka internet), I realized my use case needed function

So I transformed alias to bash function as below

And the output:

nandakumar@heramba:~$ source ~/.bashrc
nandakumar@heramba:~$ my_func nanda
/home/nandakumar/funny/ nanda


Strange!! Output Unchanged!!! I almost pondered 20 mins wondering what went wrong. I checked syntax, the commands umpteen times. Alas! No luck :-(. Eventually I closed and re-opened shell and to my surprise, here was output

nandakumar@heramba:~$ my_func nanda
/home/nandakumar/funny/nanda


The output is clean now 8-).

Later when I re-did the experiment, alias entry was still present in bash despite wiping out from .bashrc file

nandakumar@heramba:~$ alias my_func
alias my_func='echo /home/nandakumar/funny/'

I stumbled upon one of stack-overflow query later in day and figured out that this is expected. A quote states as below from: http://stackoverflow.com/questions/7131670/make-bash-alias-that-takes-parameter

"If you are changing an alias to a function, sourceing your .bashrc will add the function but it won't unalias the old alias. Since aliases are higher precedent than functions, it will try to use the alias. You need to either close and reopen your shell, or else call unalias <name>. Perhaps I'll save someone the 5 minutes I just wasted."

Ha!! HA LOL :D. All I needed was explicit unalias to achieve what I wished for. One more important lesson learned. Hope you also gained some insight.

This may look familiar to many, but for me it's new learning.