I was looking for mechanism to have zero overhead logging in my personal project. The first approach that struck my brain was creating a dummy function and calling it whenever logging is disabled. However I had many surprises while doing simple experiments. I am writing down here. Hope it helps you too while making decision.
Lets begin with problem. Here is usual way of implementing logging mechanism.
if (debug_flag_enabled & logger_mask)
//emit my logging function
where logger_mask is represented as set of bits with enabled/disabled logs
There is catch here. Even if the log does not need to be emitted, there is always that "if" condition overhead for every call.
My approach was simple which was to sink the disabled logs to "do-nothing" logger function. The approach I took was to store function pointers in an array mapped to relevant logging function. Whenever the particular log level is disabled, the respective logging function would be mapped to "do-nothing" function which would ultimately be inlined by the compiler optimization (which turned out to be bad assumption).
Assuming that all logger functions are disabled, here is code with "do-nothing" approach
#include <stdio.h>
typedef void (*logging_func)();
#define DEBUG 0
#define INFO 1
#define WARN 2
#define ERR 3
#define CRIT 4
#define MAX (CRIT+1)
void null_logger()
{
// do nothing
}
logging_func app_loggers[MAX] = {null_logger, null_logger, null_logger, null_logger, null_logger};
int main()
{
int iteration = 1000;
for (int i=0; i < iteration; i++)
{
app_loggers[DEBUG]();
app_loggers[INFO]();
app_loggers[WARN]();
app_loggers[ERR]();
app_loggers[CRIT]();
}
}
Unfortunately, my assumptions were hit back by the surprises. I compared timing of the function pointers with normal calls and the timing results were startling :-D.
Here is code with "if" condition' approach.
int main()
{
int iteration = 1000;
for (int i=0; i < iteration; i++)
{
if (logger_mask & DEBUG)
null_logger();
if (logger_mask & INFO)
null_logger();
if (logger_mask & WARN)
null_logger();
if (logger_mask & ERR)
null_logger();
if (logger_mask & CRIT)
null_logger();
}
}
The timing observations for both approaches are as below for 1k iterations. First one is for function pointer approach while second is plain "if" condition approach
For 1k iterations
nanda@heramba:~/experiment/log_c_o1$ gcc -O2 logger_blog.c
nanda@heramba:~/experiment/log_c_o1$ ./a.out
===============
START = 2778 - 241274892
END = 2778 - 241286144
===============
===============
START = 2778 - 241286213
END = 2778 - 241286283
===============
With function pointer: 11252ns
With direct call: 70ns
For 10k iterations
nanda@heramba:~/experiment/log_c_o1$ gcc -O2 logger_blog.c
nanda@heramba:~/experiment/log_c_o1$ ./a.out
===============
START = 2850 - 781701427
END = 2850 - 781759262
===============
===============
START = 2850 - 781759300
END = 2850 - 781759326
===============
With function pointer: 57835ns
With direct call: 26ns
Do you see the vast difference of latency provided by function pointers? The one with "if" condition remains relatively constant for 10k iteration also but the assumed O(1) function pointer approach increased the latency :-)
Rather than improving, the situation aggravated. Both of the approach call the same null_logger. One via stored function pointers and other via direct calls with "if" condition. Note also that this code was compiled with "-O2"
The function pointers apparently introduces lot of overhead in our code. As seen below from assembly, the function pointer has to be dereferenced and then called. Hence we have latency in this procedure. The latency becomes worst if we have multiple iterations of logging
movq app_loggers(%rip), %rdx
movl $0, %eax
call *%rdx
Also because I stored null_logger in array, the compiler did not optimize the null_logger function. When I removed the references of app_loggers[] array, the compiler did wonderful optimization. It was evident from generated assembly code which is stripped for brevity. There is no call to null_logger at all :-)
main:
.LFB24:
.cfi_startproc
leaq time_store(%rip), %rbx
leaq 64(%rbx), %r12
call clock_gettime@PLT
leaq 48+time_store(%rip), %rsi
movl $1, %edi
call clock_gettime@PLT
Complete compliable code is in my github page: https://github.com/nkumar85/experiments/tree/main/fn_ptr_if_direct
There are plenty of things to learn in basics which we fail to experiment. This experiment is one good example. It brings to mind a corollary "does the virtual functions in C++ also have similar overhead?"
No comments:
Post a Comment