Louis - Git - Blog - Contact - Resume

C Tricks that I use everyday

Some time ago, I've made a presentation in the 42 school, talking about 'Advanced' C languages tricks. Since then, I've made some new discoveries about "Magic macros" and I thought writing a blog post about it would be a good idea.

So, in this post, I will look over some compiler flags as always use in my projects, some C{99,11} standards macros, and finally some GCC only macros / attributes.

Compiler flags

-Wall -Wextra -Werror

The holy trinity. It enables all default warnings and extra warnings for the compiler. The -Werror is there to actually stop compilation when a warning is raised. I sometime don't put the -Werror flag in my compilations options, mainly when I compile external software (eg; sqlite3) in my tree.

-fstack-protector -fstack-protector-strong

Buffer overflow protection. Consider the following function:

void function(void)
{
    char *buf = alloca(256);

    /* Don't allow gcc to optimise away the buf */
    asm volatile("" :: "m" (buf));
}

Compiled with -fno-stack-protector:

08048404 <function>:
push   %ebp              ; prologue
mov    %esp,%ebp

sub    $0x128,%esp       ; reserve 0x128B on the stack
lea    0xf(%esp),%eax    ; eax = esp + 0xf
and    $0xfffffff0,%eax  ; align eax
mov    %eax,-0xc(%ebp)   ; save eax in the stack frame

leave                    ; epilogue
ret

Compiled with -fstack-protector:

08048464 <function>:
push   %ebp              ; prologue
mov    %esp,%ebp

sub    $0x128,%esp       ; reserve 0x128B on the stack

mov    %gs:0x14,%eax     ; load stack canary using gs
mov    %eax,-0xc(%ebp)   ; save it in the stack frame
xor    %eax,%eax         ; clear the register

lea    0xf(%esp),%eax    ; eax = esp + 0xf
and    $0xfffffff0,%eax  ; align eax
mov    %eax,-0x10(%ebp)  ; save eax in the stack frame

mov    -0xc(%ebp),%eax   ; load canary
xor    %gs:0x14,%eax     ; compare against one in gs
je     8048493 <fun+0x2f>
call   8048340 <__stack_chk_fail@plt>

leave                    ; epilogue
ret

As you can see here, the compiler did had some protections on the stack allocation; after the function prologue a canary is loaded and saved in the stack. Later, just before the epilogue, the canary is verified against the original. If the values mismtach, the __stack_chk_fail branch is taken and results in a pretty crash:

*** stack smashing detected ***

It is worth noting that the -fstack-protector-strong flag is fairly recent, so you can't use it on old GCC cross compile toolchains. There's also a performance cost of using the -fstack-protector flag, but I personally think that the benefit of the protection outweight the performance.

-pipe

This flag is the only one on this list that does not affect compiled code, but it simply avoid temporary files on compilation, thus speeding up the build a bit. I think modern build manager like CMake or Ninja do use that by default.

-Wshadow

This flag will make the compiler raise a warning on variable shadowing. Here's a simple example of that:

int function(void)
{
    int     a = 0;

    [...]
    while (condition)
    {
        /**
         * Here, this variable shadows the
         * function global scope one
         */
        int         a = 10;
    }
}

I mainly use this flag in order to force me to keep my codebase clean. I've found that shadowing variables can be pretty confusing for an external reader, especially on big functions.

-Wunreachable-code

The compiler will warn if it found that code is unreachable. The example is pretty simple:

int function(void)
{
    [...]
    return 0;
    printf("Hello !\n"); /* We will never go there */
}

With an example like that, you can think that someone with half a brain does not need the compiler to help him notice that the printf(3) is unreachable, but I've found this flag to be pretty useful on a large codebase when you're working on big functions and miss something:

int very_big_function(void)
{
    [...]

+    if (condition)
+           return 1;
+    else
+           return 0;

    [...] /* Some legacy code */
}

Here we can see that my patch is gonna change the comportement of the function, and any code after the if/else is never gonna be executed. Again, if you're fully awake when you code this you will see it before the compiler does, but if you don't, it's nice to have an error at compilation rather than runtime.

-Wswitch-enum

This flag will force the developer to reference every enum entry when used in a switch:

typedef enum {
    OUT_FORMAT_JSON,
    OUT_FORMAT_CSV,
    OUT_FORMAT_HTML
} out_format_t;

[...]

const char *format_data(void *some_data, out_format_t format)
{
    switch (format)
    {
        case OUT_FORMAT_JSON:
            return format_data_to_json(some_data);
        case OUT_FORMAT_CSV:
            return format_data_to_csv(some_data);
    }
}

Here we can see that I forgot to use the OUT_FORMAT_HTML entry in my switch, and the compiler will warn me about it. The warning is not about "referencing every entry of the enum in the switch" it is more about asking a question on this function. Indeed, my use case may be: "This data will never be formatted to HTML, so there's no use to handle it in the switch". So, as a good practice, maybe an assert is a good thing to add:

case OUT_FORMAT_HTML:
    assert(!"You can't format this data in HTML");

Or maybe you just forgot to code the html format function, and the compiler will remind you that the API do support three output format, and you must handle all of them:

case OUT_FORMAT_HTML:
    return format_data_to_html(some_data);
-Wstrict-prototypes

This flag will make the compiler raise a warning on nonsense legacy C prototypes:

int function(a, b)
{
    /* You guessed it, a & b are integers */
}

I do not want that in my codebases.

-D_FORTIFY_SOURCE=2

GNULibc hardening. Consider the following function:

void function(const char *s)
{
    char buf[256];

    strcpy(buf, s);

    /* Don't allow gcc to optimise away the buf */
    asm volatile("" :: "m" (buf));
}

Compiled without the flag:

08048450 <function>:
push   %ebp               ; prologue
mov    %esp,%ebp

sub    $0x118,%esp        ; reserve 0x118B on the stack
mov    0x8(%ebp),%eax     ; load parameter `s` to eax
mov    %eax,0x4(%esp)     ; save parameter for strcpy
lea    -0x108(%ebp),%eax  ; count `buf` in eax
mov    %eax,(%esp)        ; save parameter for strcpy
call   8048320 <strcpy@plt>

leave                     ; epilogue
ret

With the flag:

08048470 <function>:
push   %ebp               ; prologue
mov    %esp,%ebp

sub    $0x118,%esp        ; reserve 0x118B on the stack
movl   $0x100,0x8(%esp)   ; save value 256 as parameter
mov    0x8(%ebp),%eax     ; load parameter `s` to eax
mov    %eax,0x4(%esp)     ; save parameter for strcpy
lea    -0x108(%ebp),%eax  ; count `buf` in eax
mov    %eax,(%esp)        ; save parameter for strcpy
call   8048370 <__strcpy_chk@plt>

leave                      ; epilogue
return

You can see that the compiler called __strcpy_chk instead of strcpy. Those __XXX_chk functions are actually GCC builtins that are more safe than original libc's one, mainly because the compiler will be forced to check that the destination buffer will be big enough to receive the data, else the program will crash with a pretty:

*** buffer overflow detected ***:

Of course in my example, one must use the strncpy function instead of strcpy in order to be more safer already. Actually GCC does this automatically those days (when the buffer destination size is known), hence the bit on assembly in order to prevent gcc's optimizing.

There is a good number of GCC builtins that are more safe than the libc, you can find the full list here

-fPIE -pie

Enable ASLR on the binary.

-Wl,-z,noexecstack

Linker flag that disable stack execution on the binary. See ld(1) manual for more information.

-Wl,-z,relro -Wl,-z,now

Linker ELF Hardening flags. See ld(1) manual for more information.

Macros

COUNT_OF
#define COUNT_OF(ptr) (sizeof(ptr) / sizeof((ptr)[0]))

Consider the following C snippet:

static const char *strings[] = {
    "Hello",
    "World",
    NULL
};

for (size_t i = 0; strings[i] != NULL; i++)
    printf("Value is = %s\n", strings[i]);

Pretty basic stuff. The key thing here is the NULL value in the array. This entry sole purpose is to stop the reading loop. Thing is, the compiler (thus you) already know the size of the array, you just need to use it:

static const char *strings[] = {
    "Hello",
    "World"
};

for (size_t i = 0; i < COUNT_OF(strings); i++)
    printf("Value is = %s\n", strings[i]);

We just loose the NULL in the array, and the behavior is the same.

STATIC_ARRAY_FOREACH
#define STATIC_ARRAY_FOREACH(item, array)           \
    for (size_t keep = 1, index = 0;                \
        keep && index < COUNT_OF(array);            \
        keep = !keep, index++)                      \
    for (item = &array[index]; keep; keep = !keep)

Consider the previous iteration snippet:

static const char *strings[] = {
    "Hello",
    "World"
};

for (size_t i = 0; i < COUNT_OF(strings); i++)
    printf("Value is = %s\n", strings[i]);

In this particular case, we don't need the value of i, beside for array access. The STATIC_ARRAY_FOREACH macro is there to iterate over an array with more 'pure' approach:

const char *str[] = {
    "Hello",
    "World!"
};

STATIC_ARRAY_FOREACH(const char **ptr, str)
{
    printf("%s\n", *ptr);
}
static_assert
#define static_assert _Static_assert

Consider the following code:

typedef enum {
    ERROR_ONE,
    ERROR_TWO,
    ERROR_THREE,
    ERROR_FOUR
} error_t;

const char *error_to_str(error_t error)
{
    static const char *str[] = {
        [ERROR_ONE] = "Error One",
        [ERROR_TWO] = "Error Two",
        [ERROR_THREE] = "Error three"
    };

    return str[error];
}

Pretty basic error handling: We've got an enum that's gonna be used by functions in order to return an error if any, and we got the helper error_to_str in order to translate this error code in a more human-friendly string. Thing is, my error_to_str function isn't safe, at all. As you can see, I forgot to put the fourth entry (ERROR_FOUR). Thus if one ask for the translated string of ERROR_FOUR, it is greeted with a segfault. There are three possible approaches here:

We're gonna use the last option, since I think it is the more elegant approach, and big bonus: It happens at compilation.

typedef enum {
    ERROR_ONE,
    ERROR_TWO,
    ERROR_THREE,
    ERROR_FOUR,
    /* Always keep this one last */
    ERROR_MAX
} error_t;


const char *error_to_str(error_t error)
{
    static const char *str[] = {
        [ERROR_ONE] = "Error One",
        [ERROR_TWO] = "Error Two",
        [ERROR_THREE] = "Error three"
    };

    static_assert(COUNT_OF(str) == ERROR_MAX, "An error string is missing");

    return str[error];
}

Let's try to compile this:

$> gcc static_assert.c
[...]
static_assert.c:20:5: error: negative width in bit-field ‘__error_if_negative’
static_assert(COUNT_OF(str) == ERROR_MAX, "An error string is missing");

As you can see, we catch the mistake at compilation, meaning that the patch will never be merged in this state (You must have a build bot / CI, but that's pretty common). In large codebase, it's human to add en entry to an enum like that and forget to look on every function that must be updated with it. Keep in mind that this is one of various static_assert usage, but it is the one I use the most.

STR
#define QUOTE_ME(x) #x
#define STR(x) QUOTE_ME(x)

Take the following snippet:

#define MAX_ARGC 3

int main(int ac, const char **av)
{
    if (ac > MAX_ARGC)
    {
        printf("Maximum argc number is: %d\n", MAX_ARGC);
        return 1;
    }

    printf("All good! %s\n", av[1]);
    return 0;
}

Pretty common: We've got a macro that defines an integer, and we need to print the value of macro. Thing is, we just need to put quotes on the value in order to print it, since the preprocessor already knows the value of the define. So instead of calling the printf runtime with %d, we can simply do:

printf("Maximum argc number is: " STR(MAX_ARGC) "\n");
__printf_function
#define __printf_function(...) __attribute__((format (printf, __VA_ARGS__)))

Consider the following function:

static void custom_log(const char *fmt, ...)
{
    va_list     ap;

    va_start(ap, fmt);
    vfprintf(stdout, fmt, ap);
    va_end(ap);
}

As you can see, we've got a wrapper over vfprintf for some reason, and we need to take the printf format argument (with stdarg) on this function. Thing is, GCC does not know that we use the printf format in this function, and the following snippet will compile:

custom_log("Value is %s\n", 10);

We just need to telle the compiler that the function use the printf format:

__printf_function(1, 2)
static void custom_log(const char *fmt, ...)

And we are greeted with:

error: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘int’
__unused
#define __unused __attribute__((unused))

Pretty straightforward; If you need to tell the compiler that a variable isn't used, instead of doing (void)variable; you can do:

void *callback(void *ctx, __unused void *data)
{
    [...]
}
typeof
#define typeof __typeof__

Pretty much self explanatory:

#define auto(n, val) typeof(val) n = val

int         a;
typeof(a)   b = 10;

auto(c, 150);

printf("%d\n", b);
printf("%d\n", c);
offsetof & container_of
#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
#define container_of(ptr, type, member) ({                  \
    const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
    (type *)( (char *)__mptr - offsetof(type,member) );})

It is worth noting that GCC provides the __compiler_offsetof internal, and thus offsetof does not need to be declared with the 0 trick like I did above. container_of determine the parent structure from a pointer at compilation. It is known to be used in the LK linked-lists. Here's a really good blog post that explains how the macro works.

Tricks

Optionnal named parameters
typedef struct {
    int         do_print;
    char        *name;
} some_function_args_t;

void some_function(int required_a, int required_b, some_function_args_t *args)
{
    printf("required_a = %d\n", required_a);
    printf("required_b = %d\n", required_b);
    printf("opt.do_printf = %d\n", args->do_print);
    printf("opt.name = %s\n\n", args->name);
}
#define some_function(a, b, ...) \
    some_function(a, b, &(some_function_args_t){__VA_ARGS__})

int main(void)
{
    some_function(1, 2);
    some_function(1, 2, .do_print = 1);
    some_function(1, 2, .name = "Louis");
    some_function(1, 2, .name = "Louis", .do_print = 3);
    return 0;
}

This trick is pretty much self explanatory; the above code do compile and is standard C11. The main trick is to use C11 feature of named structure parameters, and use it to make optionnal / orderless named arguments, like in python.

defer

Defers the cleanup of a variable when it get out of scope

#define __defer_fd __attribute__((__cleanup__(cleanup_fd)))

static void cleanup_fd(int *fd)
{
    if (*fd != -1)
        close(*fd);
}

int file = open("/etc/passwd", O_RDONLY);

If you compile something like that, the file descriptor needs to be closed at the end of the function in order to avoid leaks:

==25924== FILE DESCRIPTORS: 4 open at exit.
==25924== Open file descriptor 3: /etc/passwd
==25924==    at 0x497F4C2: open (in /usr/lib/libc-2.28.so)
==25924==    by 0x10918D: main (in /tmp/a.out)

But if you add the __defer attribute, the function cleanup_fd will be called just when the variable will get out of scope:

int __defer_fd file = open("/etc/passwd", O_RDONLY);

[...]

==24113== FILE DESCRIPTORS: 3 open at exit
Explicit code breakpoint

Put an explicit breakpoint in the code

#include <sys/types.h>
#include <signal.h>

static bool breakpoint_initialized = false;
static bool breakpoint_under_debug = false;

static inline void trap(int __unused signum)
{
    breakpoint_initialized = true;
}

static inline void breakpoint_init(void)
{
    struct sigaction    old, __new;

    __new.sa_handler = trap;
    __new.sa_flags = 0;
    sigemptyset(&__new.sa_mask);
    sigaction(SIGTRAP, &__new, &old);
    kill(getpid(), SIGTRAP);
    sigaction(SIGTRAP, &old, NULL);

    if (!breakpoint_initialized)
    {
        breakpoint_initialized = true;
        breakpoint_under_debug = true;
    }
}

/**
 * Use this macro in order to set a breakpoint in the code, for GDB
 */
# define BREAKPOINT() breakpoint()

static inline void breakpoint(void)
{
    if (!breakpoint_initialized)
        breakpoint_init();
    if (breakpoint_under_debug)
        kill(getpid(), SIGTRAP);
}

Consider this example:

printf("Hello!\n");
BREAKPOINT();
printf("World!\n");

If you run it without debugging:

$> ./a.out 
Hello!
World!

Under gdb:

(gdb) r
Starting program: /tmp/a.out 
Hello!

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7df307b in kill () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7df307b in kill () from /usr/lib/libc.so.6
#1  0x0000555555555208 in breakpoint_init () at main.c:25
#2  0x000055555555526b in breakpoint () at main.c:43
#3  0x000055555555529f in main () at main.c:51

Optimisation

likely/unlikely
#define likely(x) __builtin_expect((x), 1)
#define unlikely(x) __builtin_expect((x), 0)

Inform the compiler about the probability of a condition

char *s = malloc(1024);
if (unlikely(s == NULL))
{
    /* ERROR */
}
else
{
    /* OK */
}
hot_function/cold_function
#define __hot_function __attribute__((hot))
#define __cold_function __attribute__((cold))

Inform the compiler about functions that are called often or not. Hot could be used for event callbacks for example, and cold for function that are never called, like a panic() function.

No copyright - louis at ne02ptzero dot me
Any and all opinions listed here are my own and not representative of my employers; future, past and present.