Lesser known tricks, quirks and features of C

Few lesser known tricks, quirks and features of C

There are some tricks, quirks and features (some quite fundamental to the language!)
which seems to throw even experienced developers off the track. Thus I did a sloppy
job of gathering some of them in this post (in no particular order) with even sloppier
short explanations and/or examples (or quote of thereof).

Array pointers #

Decay-to-pointer makes regular pointers to array usually not needed:

int arr[10];

int* ap0 = arr;        // array decay-to-pointer
// ap0[2] = ...

int (*ap1)[10] = &arr; // proper pointer to array
// (*ap1)[2] = ...

But ability to allocate a big multi-dimensional array on heap is nice:

int (*ap3)[90000][90000] = malloc(sizeof *ap3);

With pointers even VLA can find its use (more here):

int (*ap4)[n] = malloc(sizeof *ap4);

Comma operator #

The comma operator is used to separate two or more expressions that are
included where only one expression is expected. When the set of expressions
has to be evaluated for a value, only the right-most expression is considered.

For example: b = (a=3, a+2); – this code would firstly assign value 3
to a, and then a+2 would be assigned to variable b. So, at the end,
b would contain value 5 while variable a would be 3.

On Wikipedia we can find few more examples:

Digraphs, trigraphs and alternative tokens #

C code may not be portable, but the language itself is probably more portable
than any other; there are system using e.g. EBCDIC encoding instead of ASCII,
to support them C has digraphs and trigraphs – multi-character sequences
treated by the compiler as other characters.

Digraph		Trigraph		Macro
`<:`	`[`	`??=`	`#`	`and`	`&&`
`:>`	`]`	`??(`	`[`	`and_eq`	`&=`
`<%`	`{`	`??/`		`bitand`	`&`
`%>`	`}`	`??)`	`]`	`bitor`	`\|`
`%:`	`#`	`??'`	`^`	`compl`	`~`
`%:%:`	`##`	`??<`	`{`	`not`	`!`
——–	———–	`??!`	`\|`	`not_eq`	`!=`
——–	———–	`??>`	`}`	`or`	`\|\|`
——–	———–	`??-`	`~`	`or_eq`	`\|=`
——–	———–	——–	———–	`xor`	`^`
——–	———–	——–	———–	`xor_eq`	`^=`

Designated initializer #

These allow you to specify which elements of an object (array, structure, union)
are to be initialized by the values following. The order does not matter!

struct Foo {
    int x, y;
    const char* bar;
};

void f(void)
{
    int arr[] = { 1, 2, [5] = 9, [9] = 5, [8] = 8 };

    struct Foo f = { .y = 23, .bar = "barman", .x = -38 };

    struct Foo arr[] = {
        [10] = {      8,  8,      9 },
         [8] = {      1,  8,   bar3 },
        [12] = { .x = 9,     .z = 8 },
    };
}

Compound literals #

A compound literal looks like a cast of a brace-enclosed initializer list.
Its value is an object of the type specified in the cast, containing the
elements specified in the initializer.

#include 

struct Foo { int x, y; };

void bar(struct Foo p)
{
    printf("%d, %d", p.x, p.y);
}

int main(void)
{
    bar((struct Foo){2, 3});
    return 0;
}

Compound Literals (Using the GNU Compiler Collection (GCC))

Compound literals are lvalues #

(struct Foo){};
((struct Foo){}).x = 4;
&(struct Foo){};

Multi-character constants #

They are implementation dependent and even the standard itself to usually
best avoid them. That being said, using them as self-documenting enums
can be quite handy when you may need to deal with raw memory dumps later on.

enum state {
    waiting = 'WAIT',
    running = 'RUN!',
    stopped = 'STOP',
};

For example, on my machine I could localize 'WAIT' like here:

00001120: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00  .ff...........@.
00001130: f3 0f 1e fa e9 67 ff ff ff 55 48 89 e5 48 83 ec  .....g...UH..H..
00001140: 10 c7 45 fc 54 49 41 57 8b 45 fc 89 c6 48 8d 05  ..E.TIAW.E...H..
00001150: b0 0e 00 00 48 89 c7 b8 00 00 00 00 e8 cf fe ff  ....H...........
00001160: ff b8 00 00 00 00 c9 c3 f3 0f 1e fa 48 83 ec 08  ............H...

Bit fields #

Declares a member with explicit width, in bits. Adjacent bit field members may
be packed to share and straddle the individual bytes.

struct cat {
    unsigned int legs  : 3;  // 3 bits for legs  (0-4 fit in 3 bits)
    unsigned int lives : 4;  // 4 bits for lives (0-9 fit in 4 bits)
};

0 bit fields #

Description from Arm Compiler 6 docs:

A zero-length bit-field can be used to make the following changes:

Creates a boundary between any bit-fields before the zero-length bit-field
and any bit-fields after the zero-length bit-field. Any bit-fields on
opposite sides of the boundary are treated as non-overlapping memory
locations. This has a consequence for C and C++ programs. The C and C++
standards require both load and store accesses to a bit-field on one side
of the boundary to not access any bit-fields on the other side of the boundary.

Insert padding to align any bit-fields after the zero-length bit-field to
the next available natural boundary based on the type of the zero-length
bit-field. For example, char:0 can be used to align to the next available
byte boundary, and int:0 can be used to align to the next available word boundary.

An example taken from the SO answer (with slight changes):

struct bar {
    unsigned char x : 5;
    unsigned short  : 0;
    unsigned char y : 7;
}
The above in memory would look like this (assuming 16-bit short, ignoring endian):
char pad pad      short boundary
 |    |   |        |
 v    v   v        v
 xxxxx000 00000000 yyyyyyy0
The zero-length bit field causes the position to move to next short boundary
(or: be placed on the nearest natural alignment for the target platform).
We defined short to be 16-bit, so 16 minus 5 gives 11 bits of padding.

`volatile` type qualifier #

This qualifier tells the compiler that a variable may be accessed by other means
than the current code (e.g. we are dealing with MMIO device), thus to not optimize
away reads and writes to this resource.

`restrict` type qualifier #

By adding this type qualifier, a programmer hints to the compiler that for
the lifetime of the pointer, no other pointer will be used to access the object
to which it points. This allows the compiler to make optimizations (for example,
vectorization) that would not otherwise have been possible.

`register` type qualifier #

It suggests that the compiler stores a declared variable in a CPU register
(or some other faster location) instead of in random-access memory.
The location of a variable declared with this qualifier cannot be accessed
(but the sizeof operator can be applied).

Nowadays register is usually meaningless as modern compilers place variables
in a register if appropriate regardless of whether the hint is given. Sometimes
may it be useful on embedded systems, but even then compiler will probably
provide better optimizations.

Flexible array member #

From Wikipedia:

struct vectord {
    short len;    // there must be at least one other data member
    double arr[]; // the flexible array member must be last

    // The compiler may reserve extra padding space here,
    //   like it can between struct members.
};

struct vectord *vector = malloc(...);
vector->len = ...;
for (int i = 0; i < vector->len; ++i) {
     vector->arr[i] = ...;  // transparently uses the right type (double)
}

`%n` format specifier #

This StackOverflow answer presents it reasonably well:

%n returns the current position of the imaginary cursor used when printf() formats its output.

int pos1, pos2;
const char* str_of_unknown_len = "we don't care about the length of this";

printf("Write text of unknown %n(%s)%n lengthn", &pos1, str_of_unknown_len, &pos2);
printf("%*s\%*s/n", pos1, " ", pos2-pos1-2, " ");
printf("%*s", pos1+1, " ");
for (int i = pos1+1; i < pos2-1; ++i) {
    putc('-', stdout);
}
putc('n', stdout);

will have following output

Write text of unknown (we don't care about the length of this) length
                                                            /
                       --------------------------------------

Granted a little bit contrived but can have some uses when making pretty reports.

`%.*` (minimum field width) format specifier #

Instead of this:

char fmt_buf[MAX_BUF];
snprintf(fmt_buf, MAX_BUF, "%%.%df", prec);
printf(fmt_buf, num);

do this:

printf("%.*f", prec, num);

when you want to pad with variable number of characters.

Other less known format specifiers #

Have a look at 7.21.6.1
and 7.21.6.2
of the draft of C11 standard. You'll find %#, %e, %-, %+, %j, %g, %a and few other interesting specifiers.

Interlacing syntactic constructs #

The following is syntactically correct C code:

#include 

int main()
{
    int n = 3;
    int i = 0;

    switch (n % 2) {
        case 0:
            do {
                ++i;
        case 1:
                ++i;
            } while (--n > 0);

    }

    printf("%dn", i); // 5
}

I know gotophobic programmers using it like this:

    switch (x) {
        case 1:
            // 1 specific code

      if (0) {
        case 2:
            // 2 specific code
      }

            // common for 1 and 2
    }

The most famous usage of this quirk/"feature" is Duff's device:

send(to, from, count)
    register short *to, *from;
    register count;
{
    register n = (count + 7) / 8;
    switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }
}

`-->` "operator" #

The following is correct C code:

size_t n = 10;
while (n --> 0) {
    printf("%dn", n);
}

You may ask, since when C has such operator and the answer is: since never.
--> is not an operator, but two separate operators -- and > written
in a way they look like one. It's possible, because C cares less than more
about whitespace.

n --> 0 is equivalent of (n--) > 0

`idx[arr]` #

Square brace notation of accessing array elements is a syntactic sugar for pointer arithmetics:

arr[5] ≡ *(arr + 5) ≡ *(5 + arr) ≡ 5[arr]

You absolutely must never use this in actual code… but it's hella fun otherwise!

// array[index]
boxes[products[myorder.product].box].weight;

// index[array]
myorder.product[products].box[boxes].weight;

Negative array indexes #

For quick and dirty debugging purposes I wanted to check if padding at the end
of an array is filled with correct value, but I didn't know where the padding
starts. Thus I did the following:

int* end = arr + (len - 1);
if (end[0] == VAL && end[-1] == VAL && end[-5] == VAL) {
    puts("Correct padding");
}

Constant string concatenation #

You don't need sprintf() (nor strcat()!) to concatenate strings literals:

#define WORLD "World!"
const char* s = "Hello " WORLD "n"
                "It's a lovely day, "
                "innit?";

Using `&&` and `||` as conditionals #

If you write Shell scripts, you know what I mean.

#include 
#include 

int main(void)
{
    1 && puts("Hello");
    0 && puts("I won't");
    1 && puts("World!");
    0 && puts("be printed");
    1 || puts("I won't be printed either");
    0 || puts("But I will!");

    true && (9 > 2) && puts("9 is bigger than 2");

    isdigit('9') && puts("9 is a digit");
    isdigit('n') && puts("n is a digit") || puts("n is NOT a digit!");

    return 0;
}

The compiler will probably scream warnings at you
as it's really uncommon to do this in C code.

Compile time assumption checking using `enum`s #

#define D 1
#define DD 2

enum CompileTimeCheck
{
    MAKE_SURE_DD_IS_TWICE_D = 1/(2*(D) == (DD)),
    MAKE_SURE_DD_IS_POW2    = 1/((((DD) - 1) & (DD)) == 0)
};

Can be useful for libraries with compile-time configurable constants.

Ad hoc `struct` declaration in the return type of a function #

You can define structs in very (at first glance) random places:

#include 

struct Foo { int a, b, c; } make_foo(void) {
    struct Foo ret = { .c = 3 };
    ret.a = 11 + ret.c;
    ret.b = ret.a * 3;
    return ret;
}

int main()
{
    struct Foo x = make_foo();
    printf("%dn", x.a + x.b + x.c);
    return 0;
}

"Nested" `struct` definition is not kept nested #

#include 

struct Foo {
    int x;
    struct Bar {
        int y;
    };
};

int main()
{
    struct Bar s = { 34 };  // correct
    // struct Foo.Bar s;    // wrong
    printf("%dn", s.y);
    return 0;
}

Flat initializer lists #

int arr[3][3] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
//            = { {1,2,3}, {4,5,6}, {7,8,9} };


struct Foo {
    const char *name;
    int age;
};

struct Foo records[] = {
    "John",   20,
    "Bertha", 40,
    "Andrew", 30,
};

Static array indices in function parameter declarations #

Except in certain contexts, an unsubscripted array name (for example, region
instead of region[4]) represents a pointer whose value is the address of the
first element of the array, provided that the array has previously been declared.
An array type in the parameter list of a function is also converted to the
corresponding pointer type. Information about the size of the argument array
is lost when the array is accessed from within the function body.

To preserve this information, which is useful for optimization, C99 allows you
to declare the index of the argument array using the static keyword. The constant
expression specifies the minimum pointer size that can be used as an assumption
for optimizations. This particular usage of the static keyword is highly prescribed.
The keyword may only appear in the outermost array type derivation and only in
function parameter declarations. If the caller of the function does not abide
by these restrictions, the behavior is undefined.

The following examples show how the feature can be used.
int n;
void foo(int arr[static 10]);       // arr points to the first of at least 10 ints
void foo(int arr[const 10]);        // arr is a const pointer
void foo(int arr[const]);           // const pointer to int
void foo(int arr[static const n]);  // arr points to at least n ints (VLA)

void foo(int p[static 1]); is effectively a standard
way to declare that p must be non-null pointer.

Macro Overloading by Argument List Length #

#include 
#include "cmoball.h"

#define NoA(...) CMOBALL(FOO, __VA_ARGS__)
#define FOO_3(x,y,z) "Three"
#define FOO_2(x,y)   "Two"
#define FOO_1(x)     "One"
#define FOO_0()      "Zero"


int main()
{
    puts(NoA());
    puts(NoA(1));
    puts(NoA(1,1));
    puts(NoA(1,1,1));
    return 0;
}

Function types #

Function pointers ought to be well known, but as we know the syntax is bit awkward.
On the other hand, less people know you can (as with most objects in C) create
a typedef for function type.

#include 

int main()
{
    typedef double fun_t(double);
    fun_t sin, cos, sqrt;
    fun_t* ftpt = &sqrt;

    printf("%lfn", ftpt(4)); // 2.000000

    return 0;
}

X-Macros #

Named function parameters #

struct _foo_args {
    int num;
    const char* text;
};

#define foo(...) _foo((struct _foo_args){ __VA_ARGS__ })
int _foo(struct _foo_args args)
{
    puts(args.text);
    return args.num * 2;
}

int main(void)
{
    int result = foo(.text = "Hello!", .num = 8);
    return 0;
}

Combining default, named and positional arguments #

Using compound literals and macros to create named arguments (…):
typedef struct { int a,b,c,d; } FooParam;
#define foo(...) foo((FooParam){ __VA_ARGS__ })
void (foo)(FooParam p);
adding default arguments is also quite easy:
#define foo(...) foo((FooParam){ .a=1, .b=2, .c=3, .d=4, __VA_ARGS__})
But now positional arguments don't work anymore, and there may be situations
where you want to support both options. But I recently realized, that you can
make them work by adding a dummy parameter:
typedef struct { int _; int a,b,c,d; } FooParam;
#define foo(...) foo((FooParam){ .a=1, .b=2, .c=3, .d=4, ._=0, __VA_ARGS__})
Now, foo can be called in the following ways:
foo();           // a=1, b=2, c=3, d=4
foo(.a=4, .b=5); // a=4, b=5, c=3, d=5
foo(4, 5);       // a=4, b=5, c=3, d=5
foo(4, 5, .d=8); // a=4, b=5, c=3, d=8
The dummy parameter isn't needed when you have arguments that are required to be passed by name:
typedef struct { int alwaysNamed; int a,b,c,d; } FooParam;
#define foo(...) foo((FooParam){.a=1,.b=2,.c=3,.d=4, .alwaysNamed=5, __VA_ARGS__})

Abusing unions for grouping things into namespaces #

Suppose that you have a struct with a bunch of fields, and you want to deal
with some of them all together at once under a single name; perhaps you want
to conveniently copy them as a block through struct assignment.

By using unions you can access both a.field2 and a.sub (and a.field2
is the same as a.sub.field2) without any macros.
struct a {
    int field1;
    union {
        struct {
            int field2;
            int field3;
        };
        struct {
            int field2;
            int field3;
        } sub;
    };
};

Matching character classes with `sscanf()` #

From this comment on Reddit:

sscanf() can be used as an ersatz "regex" (not really, only character classes) matcher.
For example, one can write something like this to check if the input consists of letters of underscores:
int len = 0;
char buf[256];
int read_token = sscanf(input, "%255[a-zA-Z_]", buf, &len);
if (read_token) { /* do something */ }
or skip whitespace characters:
int len = 0;
char buf[256];
sscanf(input, "%255[rn]%n", buf, &len);
input += len;

Garbage collector #

Boehm GC is a library providing garbage collector for C and C++

Cosmopolitan Libc #

Description from project's website:

Cosmopolitan Libc makes C a build-once run-anywhere language, like Java,
except it doesn't need an interpreter or virtual machine. Instead, it
reconfigures stock GCC and Clang to output a POSIX-approved polyglot format
that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + NetBSD + BIOS
with the best possible performance and the tiniest footprint imaginable.

Inline assembly #

For a high-level language C communicates quite well with low-level world. You
can write Assembly code and link it against program written in C quite easily.
In addition to that, many compilers offer as an extension (listed as common
in Annex J of the C Standard) a feature called inline assembly, typically
introduced to the code by the asm keyword.

Object Oriented Programming #

C11 added _Generic to language, but turns out metaprogramming
by inhumanely abusing the preporcessor is possible even in pure C99:
meet Metalang99 library.

#include 

datatype(
    BinaryTree,
    (Leaf, int),
    (Node, BinaryTree *, int, BinaryTree *)
);

int sum(const BinaryTree *tree) {
    match(*tree) {
        of(Leaf, x) return *x;
        of(Node, lhs, x, rhs) return sum(*lhs) + *x + sum(*rhs);
    }

    return -1;
}

`sizeof` twice as case labels to get an error telling the size at compile time #

int foo(int c)
{
    switch (c) {
        case sizeof (struct Foo): return c + 1;
        case sizeof (struct Foo): return c + 2;
    }
}

Adding such simple function anywhere in your code may (depending on compiler)
produce an error message like:

error: duplicate case value '16'
        case sizeof(struct Foo): return c + 2;
             ^

Which tells us that sizeof (struct Foo) is 16.

Cello #

A snippet says more than thousand words:

#include "Cello.h"

int main(int argc, char** argv) {

  /* Stack objects are created using "$" */
  var i0 = $(Int, 5);
  var i1 = $(Int, 3);
  var i2 = $(Int, 4);

  /* Heap objects are created using "new" */
  var items = new(Array, Int, i0, i1, i2);

  /* Collections can be looped over */
  foreach (item in items) {
    print("Object %$ is of type %$n",
      item, type_of(item));
  }

  /* Heap objects destructed via Garbage Collection */
  return 0;
}

2023-02-19