C Tricks that I use everyday
Some time ago, I've made a presentation in the 42 school, talking about 'Advanced' C languages tricks. Since then, I've made some new discoveries about "Magic macros" and I thought writing a blog post about it would be a good idea.
So, in this post, I will look over some compiler flags as always use in my
projects, some C{99,11}
standards macros, and finally some GCC only macros /
attributes.
Compiler flags
-Wall -Wextra -Werror
The holy trinity. It enables all default warnings and extra warnings for the
compiler. The -Werror
is there to actually stop compilation when a warning is
raised. I sometime don't put the -Werror
flag in my compilations options,
mainly when I compile external software (eg; sqlite3
) in my tree.
-fstack-protector -fstack-protector-strong
Buffer overflow protection. Consider the following function:
void function(void)
{
char *buf = alloca(256);
/* Don't allow gcc to optimise away the buf */
asm volatile("" :: "m" (buf));
}
Compiled with -fno-stack-protector
:
08048404 <function>:
push %ebp ; prologue
mov %esp,%ebp
sub $0x128,%esp ; reserve 0x128B on the stack
lea 0xf(%esp),%eax ; eax = esp + 0xf
and $0xfffffff0,%eax ; align eax
mov %eax,-0xc(%ebp) ; save eax in the stack frame
leave ; epilogue
ret
Compiled with -fstack-protector
:
08048464 <function>:
push %ebp ; prologue
mov %esp,%ebp
sub $0x128,%esp ; reserve 0x128B on the stack
mov %gs:0x14,%eax ; load stack canary using gs
mov %eax,-0xc(%ebp) ; save it in the stack frame
xor %eax,%eax ; clear the register
lea 0xf(%esp),%eax ; eax = esp + 0xf
and $0xfffffff0,%eax ; align eax
mov %eax,-0x10(%ebp) ; save eax in the stack frame
mov -0xc(%ebp),%eax ; load canary
xor %gs:0x14,%eax ; compare against one in gs
je 8048493 <fun+0x2f>
call 8048340 <__stack_chk_fail@plt>
leave ; epilogue
ret
As you can see here, the compiler did had some protections on the stack
allocation; after the function prologue a
canary is
loaded and saved in the stack. Later, just before the epilogue, the canary is
verified against the original. If the values mismtach, the __stack_chk_fail
branch is taken and results in a pretty crash:
*** stack smashing detected ***
It is worth noting that the -fstack-protector-strong
flag is fairly recent,
so you can't use it on old GCC cross compile toolchains. There's also a
performance cost of using the -fstack-protector
flag, but I personally think
that the benefit of the protection outweight the performance.
-pipe
This flag is the only one on this list that does not affect compiled code, but it simply avoid temporary files on compilation, thus speeding up the build a bit. I think modern build manager like CMake or Ninja do use that by default.
-Wshadow
This flag will make the compiler raise a warning on variable shadowing. Here's a simple example of that:
int function(void)
{
int a = 0;
[...]
while (condition)
{
/**
* Here, this variable shadows the
* function global scope one
*/
int a = 10;
}
}
I mainly use this flag in order to force me to keep my codebase clean. I've found that shadowing variables can be pretty confusing for an external reader, especially on big functions.
-Wunreachable-code
The compiler will warn if it found that code is unreachable. The example is pretty simple:
int function(void)
{
[...]
return 0;
printf("Hello !\n"); /* We will never go there */
}
With an example like that, you can think that someone with half a brain does not
need the compiler to help him notice that the printf(3)
is unreachable, but I've found this flag to be pretty useful on a large codebase when you're working on big functions and miss something:
int very_big_function(void)
{
[...]
+ if (condition)
+ return 1;
+ else
+ return 0;
[...] /* Some legacy code */
}
Here we can see that my patch is gonna change the comportement of the function,
and any code after the if/else
is never gonna be executed. Again, if you're
fully awake when you code this you will see it before the compiler does, but if
you don't, it's nice to have an error at compilation rather than runtime.
-Wswitch-enum
This flag will force the developer to reference every enum entry when used in a switch:
typedef enum {
OUT_FORMAT_JSON,
OUT_FORMAT_CSV,
OUT_FORMAT_HTML
} out_format_t;
[...]
const char *format_data(void *some_data, out_format_t format)
{
switch (format)
{
case OUT_FORMAT_JSON:
return format_data_to_json(some_data);
case OUT_FORMAT_CSV:
return format_data_to_csv(some_data);
}
}
Here we can see that I forgot to use the OUT_FORMAT_HTML
entry in my switch,
and the compiler will warn me about it. The warning is not about "referencing
every entry of the enum in the switch" it is more about asking a question on
this function. Indeed, my use case may be: "This data will never be formatted
to HTML, so there's no use to handle it in the switch". So, as a good practice,
maybe an assert is a good thing to add:
case OUT_FORMAT_HTML:
assert(!"You can't format this data in HTML");
Or maybe you just forgot to code the html format function, and the compiler will remind you that the API do support three output format, and you must handle all of them:
case OUT_FORMAT_HTML:
return format_data_to_html(some_data);
-Wstrict-prototypes
This flag will make the compiler raise a warning on nonsense legacy C prototypes:
int function(a, b)
{
/* You guessed it, a & b are integers */
}
I do not want that in my codebases.
-D_FORTIFY_SOURCE=2
GNULibc
hardening. Consider the following function:
void function(const char *s)
{
char buf[256];
strcpy(buf, s);
/* Don't allow gcc to optimise away the buf */
asm volatile("" :: "m" (buf));
}
Compiled without the flag:
08048450 <function>:
push %ebp ; prologue
mov %esp,%ebp
sub $0x118,%esp ; reserve 0x118B on the stack
mov 0x8(%ebp),%eax ; load parameter `s` to eax
mov %eax,0x4(%esp) ; save parameter for strcpy
lea -0x108(%ebp),%eax ; count `buf` in eax
mov %eax,(%esp) ; save parameter for strcpy
call 8048320 <strcpy@plt>
leave ; epilogue
ret
With the flag:
08048470 <function>:
push %ebp ; prologue
mov %esp,%ebp
sub $0x118,%esp ; reserve 0x118B on the stack
movl $0x100,0x8(%esp) ; save value 256 as parameter
mov 0x8(%ebp),%eax ; load parameter `s` to eax
mov %eax,0x4(%esp) ; save parameter for strcpy
lea -0x108(%ebp),%eax ; count `buf` in eax
mov %eax,(%esp) ; save parameter for strcpy
call 8048370 <__strcpy_chk@plt>
leave ; epilogue
return
You can see that the compiler called __strcpy_chk
instead of strcpy
. Those
__XXX_chk
functions are actually GCC builtins that are more safe than
original libc's one, mainly because the compiler will be forced to check that
the destination buffer will be big enough to receive the data, else the program
will crash with a pretty:
*** buffer overflow detected ***:
Of course in my example, one must use the strncpy
function instead of
strcpy
in order to be more safer already. Actually GCC does this automatically
those days (when the buffer destination size is known), hence the bit on
assembly in order to prevent gcc's optimizing.
There is a good number of GCC builtins that are more safe than the libc, you can find the full list here
-fPIE -pie
Enable ASLR on the binary.
-Wl,-z,noexecstack
Linker flag that disable stack
execution on the binary. See ld(1)
manual for more information.
-Wl,-z,relro -Wl,-z,now
Linker ELF Hardening flags. See ld(1)
manual for more information.
Macros
COUNT_OF
#define COUNT_OF(ptr) (sizeof(ptr) / sizeof((ptr)[0]))
Consider the following C snippet:
static const char *strings[] = {
"Hello",
"World",
NULL
};
for (size_t i = 0; strings[i] != NULL; i++)
printf("Value is = %s\n", strings[i]);
Pretty basic stuff. The key thing here is the NULL
value in the array. This
entry sole purpose is to stop the reading loop. Thing is, the compiler (thus
you) already know the size of the array, you just need to use it:
static const char *strings[] = {
"Hello",
"World"
};
for (size_t i = 0; i < COUNT_OF(strings); i++)
printf("Value is = %s\n", strings[i]);
We just loose the NULL
in the array, and the behavior is the same.
STATIC_ARRAY_FOREACH
#define STATIC_ARRAY_FOREACH(item, array) \
for (size_t keep = 1, index = 0; \
keep && index < COUNT_OF(array); \
keep = !keep, index++) \
for (item = &array[index]; keep; keep = !keep)
Consider the previous iteration snippet:
static const char *strings[] = {
"Hello",
"World"
};
for (size_t i = 0; i < COUNT_OF(strings); i++)
printf("Value is = %s\n", strings[i]);
In this particular case, we don't need the value of i
, beside for array
access. The STATIC_ARRAY_FOREACH
macro is there to iterate over an array with
more 'pure' approach:
const char *str[] = {
"Hello",
"World!"
};
STATIC_ARRAY_FOREACH(const char **ptr, str)
{
printf("%s\n", *ptr);
}
static_assert
#define static_assert _Static_assert
Consider the following code:
typedef enum {
ERROR_ONE,
ERROR_TWO,
ERROR_THREE,
ERROR_FOUR
} error_t;
const char *error_to_str(error_t error)
{
static const char *str[] = {
[ERROR_ONE] = "Error One",
[ERROR_TWO] = "Error Two",
[ERROR_THREE] = "Error three"
};
return str[error];
}
Pretty basic error handling: We've got an enum that's gonna be used by
functions in order to return an error if any, and we got the helper
error_to_str
in order to translate this error code in a more human-friendly
string.
Thing is, my error_to_str
function isn't safe, at all. As you can see, I
forgot to put the fourth entry (ERROR_FOUR
). Thus if one ask for the
translated string of ERROR_FOUR
, it is greeted with a segfault
. There are
three possible approaches here:
- Change the function in a
switch/case
if/elseif/else
tree. That way, we can handle OOB error codes. - Add a condition before the return in order to check that the asked entry does exists before accessing it
- Add a
static_assertion
(eg; A condition tested at compilation) in order to check that the string array is not missing any entries.
We're gonna use the last option, since I think it is the more elegant approach, and big bonus: It happens at compilation.
typedef enum {
ERROR_ONE,
ERROR_TWO,
ERROR_THREE,
ERROR_FOUR,
/* Always keep this one last */
ERROR_MAX
} error_t;
const char *error_to_str(error_t error)
{
static const char *str[] = {
[ERROR_ONE] = "Error One",
[ERROR_TWO] = "Error Two",
[ERROR_THREE] = "Error three"
};
static_assert(COUNT_OF(str) == ERROR_MAX, "An error string is missing");
return str[error];
}
Let's try to compile this:
$> gcc static_assert.c
[...]
static_assert.c:20:5: error: negative width in bit-field ‘__error_if_negative’
static_assert(COUNT_OF(str) == ERROR_MAX, "An error string is missing");
As you can see, we catch the mistake at compilation, meaning that the patch
will never be merged in this state (You must have a build bot / CI, but that's
pretty common). In large codebase, it's human to add en entry to an enum like
that and forget to look on every function that must be updated with it. Keep in
mind that this is one of various static_assert
usage, but it is the one I use
the most.
STR
#define QUOTE_ME(x) #x
#define STR(x) QUOTE_ME(x)
Take the following snippet:
#define MAX_ARGC 3
int main(int ac, const char **av)
{
if (ac > MAX_ARGC)
{
printf("Maximum argc number is: %d\n", MAX_ARGC);
return 1;
}
printf("All good! %s\n", av[1]);
return 0;
}
Pretty common: We've got a macro that defines an integer, and we need to print
the value of macro. Thing is, we just need to put quotes on the value in order
to print it, since the preprocessor already knows the value of the define. So
instead of calling the printf
runtime with %d
, we can simply do:
printf("Maximum argc number is: " STR(MAX_ARGC) "\n");
__printf_function
#define __printf_function(...) __attribute__((format (printf, __VA_ARGS__)))
Consider the following function:
static void custom_log(const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
vfprintf(stdout, fmt, ap);
va_end(ap);
}
As you can see, we've got a wrapper over vfprintf
for some reason, and we
need to take the printf format argument (with stdarg
) on this function. Thing
is, GCC does not know that we use the printf format in this function, and the
following snippet will compile:
custom_log("Value is %s\n", 10);
We just need to telle the compiler that the function use the printf
format:
__printf_function(1, 2)
static void custom_log(const char *fmt, ...)
And we are greeted with:
error: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘int’
__unused
#define __unused __attribute__((unused))
Pretty straightforward; If you need to tell the compiler that a variable isn't
used, instead of doing (void)variable;
you can do:
void *callback(void *ctx, __unused void *data)
{
[...]
}
typeof
#define typeof __typeof__
Pretty much self explanatory:
#define auto(n, val) typeof(val) n = val
int a;
typeof(a) b = 10;
auto(c, 150);
printf("%d\n", b);
printf("%d\n", c);
offsetof
& container_of
#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
It is worth noting that GCC provides the __compiler_offsetof
internal, and
thus offsetof
does not need to be declared with the 0
trick like I did
above. container_of
determine the parent structure from a pointer at compilation. It is
known to be used in the LK linked-lists. Here's a really good blog
post that explains how the
macro works.
Tricks
Optionnal named parameters
typedef struct {
int do_print;
char *name;
} some_function_args_t;
void some_function(int required_a, int required_b, some_function_args_t *args)
{
printf("required_a = %d\n", required_a);
printf("required_b = %d\n", required_b);
printf("opt.do_printf = %d\n", args->do_print);
printf("opt.name = %s\n\n", args->name);
}
#define some_function(a, b, ...) \
some_function(a, b, &(some_function_args_t){__VA_ARGS__})
int main(void)
{
some_function(1, 2);
some_function(1, 2, .do_print = 1);
some_function(1, 2, .name = "Louis");
some_function(1, 2, .name = "Louis", .do_print = 3);
return 0;
}
This trick is pretty much self explanatory; the above code do compile and is
standard C11
. The main trick is to use C11
feature of named structure
parameters, and use it to make optionnal / orderless named arguments, like in
python.
defer
Defers the cleanup of a variable when it get out of scope
#define __defer_fd __attribute__((__cleanup__(cleanup_fd)))
static void cleanup_fd(int *fd)
{
if (*fd != -1)
close(*fd);
}
int file = open("/etc/passwd", O_RDONLY);
If you compile something like that, the file descriptor needs to be closed at the end of the function in order to avoid leaks:
==25924== FILE DESCRIPTORS: 4 open at exit.
==25924== Open file descriptor 3: /etc/passwd
==25924== at 0x497F4C2: open (in /usr/lib/libc-2.28.so)
==25924== by 0x10918D: main (in /tmp/a.out)
But if you add the __defer
attribute, the function cleanup_fd
will be
called just when the variable will get out of scope:
int __defer_fd file = open("/etc/passwd", O_RDONLY);
[...]
==24113== FILE DESCRIPTORS: 3 open at exit
Explicit code breakpoint
Put an explicit breakpoint in the code
#include <sys/types.h>
#include <signal.h>
static bool breakpoint_initialized = false;
static bool breakpoint_under_debug = false;
static inline void trap(int __unused signum)
{
breakpoint_initialized = true;
}
static inline void breakpoint_init(void)
{
struct sigaction old, __new;
__new.sa_handler = trap;
__new.sa_flags = 0;
sigemptyset(&__new.sa_mask);
sigaction(SIGTRAP, &__new, &old);
kill(getpid(), SIGTRAP);
sigaction(SIGTRAP, &old, NULL);
if (!breakpoint_initialized)
{
breakpoint_initialized = true;
breakpoint_under_debug = true;
}
}
/**
* Use this macro in order to set a breakpoint in the code, for GDB
*/
# define BREAKPOINT() breakpoint()
static inline void breakpoint(void)
{
if (!breakpoint_initialized)
breakpoint_init();
if (breakpoint_under_debug)
kill(getpid(), SIGTRAP);
}
Consider this example:
printf("Hello!\n");
BREAKPOINT();
printf("World!\n");
If you run it without debugging:
$> ./a.out
Hello!
World!
Under gdb:
(gdb) r
Starting program: /tmp/a.out
Hello!
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7df307b in kill () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007ffff7df307b in kill () from /usr/lib/libc.so.6
#1 0x0000555555555208 in breakpoint_init () at main.c:25
#2 0x000055555555526b in breakpoint () at main.c:43
#3 0x000055555555529f in main () at main.c:51
Optimisation
likely/unlikely
#define likely(x) __builtin_expect((x), 1)
#define unlikely(x) __builtin_expect((x), 0)
Inform the compiler about the probability of a condition
char *s = malloc(1024);
if (unlikely(s == NULL))
{
/* ERROR */
}
else
{
/* OK */
}
hot_function/cold_function
#define __hot_function __attribute__((hot))
#define __cold_function __attribute__((cold))
Inform the compiler about functions that are called often or not. Hot could be
used for event callbacks for example, and cold for function that are never
called, like a panic()
function.