An esoteric exploration on the target independence of compiler IRs.

Target-independent IRs and target-dependent optimizations

The main reason to have a target-independent IR is that there are many
optimizations who are independent of the target. The idea, then, is to
implement these optimizations once, on a target-independent IR,
and not for every source-target combination.

This idea works great when we are dealing with optimizations which are
actually target-independent. A canonical example is
dead-code elimination. The characteristics of the target don’t matter
when deleting code. You just delete as much as you can and it will
always be beneficial no matter what.

Things get more complicated when we have transformations which
are target-independent but the decision-making involved is not. This
requires a little analysis. When talking about a compiler
optimization, it makes sense to think of it as having at least three
components: the transformation, the legality constraints and the
profitability analysis (shameless
plug of my thesis because it talks about this distinction
extensively
). The transformation is the kind of changes we perform
to the code, like moving, inserting or deleting code. For example, in
dead-code elimination, we literally delete statements. In loop tiling,
we strip-mine loops by certain factors and interchange them. Legality
is concerned with when it is legal to perform the transformation, and
profitability analysis is concerned with when it is profitable to do
it, and with what parameters.

For many optimizations, the transformation is target-independent but
the profitability analysis is not. For example, you don’t need to know
anything about the target to perform loop-tiling. This implies that the
language/IR does not need to capture target details. For example, you
can perform loop tiling in e.g., Rust. However, to do loop-tiling
effectively, you
do need to know about the target, for example, you should at
least know the size of the caches.

This gave rise to a popular direction in optimizing compilers where
the IR, on which transformations are performed, is target-independent,
but there is also target-specific information flowing around so
that passes can perform the transformations effectively. In that way,
we have a nice trade-off where we still have one IR, but we
perform optimizations specialized per target.

That’s at least the narrative I had in my mind until recently…

How target-independent is the LLVM IR?

Let’s now take the most popular compiler IR ever, the LLVM IR, and examine its
target (and later, source) independence. But first, we should think a
bit harder about the definition of target independence.

The simple, and conventional, definition is that the IR does not have
entities that are tied to a particular target. In that sense, the LLVM
IR is mostly target-independent. For example, its core
operations like add are
not tied to a particular target.

In practice, the LLVM IR does not fully abide to this definition. For
example, the inreg
is target-specific and the same is true for a ton of intrinsics. But I
would say these are not big problems, or at least, they are not that
intellectually intriguing.

The more interesting problem is that the definition above is too
simplistic to capture some adverse effects of target dependence
that an independent IR is supposed to avoid. In particular, for an IR
to be independent, we better add the constraint that whatever
front-end lowers to it, does not need to know the resulting target.
This is due to one of the main goals of target-independent IRs which
is to reduce the explosion of source-target combinations.

In that sense, the LLVM IR is not target-independent, but the
reasons are subtler. Consider this
example
. We have some simple C code translated, by Clang,
to LLVM IR, for 3 different targets. Theoretically, the IR should
have been the same for all three. The simplistic reason they are
different is the Application Binary Interface (ABI). For example, with
the
x86-64
ABI
(section 3.2.3), it’s ok to treat the struct as a single
integer, both for passing it and returning it 1. On the other hand, the MIPS
ABI
(“Argument Passing”) e.g., tells us that if the function
returns a struct, then the first argument should be a pointer to which
the result will be stored.

But the question is: Why does the front-end have to deal with
it? Notice that in LLVM IR we could both pass, and return, a
struct by value. So, we would expect Clang to emit this
IR for all targets and let each different back-end translate
it to the target-specific form. To understand the discrepancy, we need
to look deeper into what ABIs define.

Notice that the need for an ABI, for a target, arises only when we are
dealing with concepts that are higher-level than, or anyway not
existent, in the assembly language (of that same target). For example,
the reason we need calling conventions is that the concept of an
argument does not exist e.g., in x86 assembly. It is a higher-level
concept, which can however be implemented using this assembly. The
problem is that it can be implemented in many different ways and so to
have programs that function correctly, all compilers need to agree to
implement it in the same way.

The problem of dependence in the IR arises when the concept is
higher-level, or non-existent, in the IR too! For example, the LLVM IR
does not have classes (but one of course can implement them in many
ways). To see why this is a problem, suppose the ABI defines calling
conventions specific to classes. This is e.g., a reality in the x86-64
ABI
. It states: “If a C++ object has either a non-trivial copy
constructor or a non-trivial destructor, it is passed by invisible
reference …”.

The front-end can’t implement classes arbitrarily, and then expect the
back-end to apply the calling convention, because the concept of a
class is lost in the translation! In other words, it will now be very
hard for a back-end to, essentially, reverse-engineer, that certain
sequences of instructions implement the concept of a class.

A subtler but similar example is something like the int
type in C, whose bit-width depends on the target. The LLVM IR does not
have such a type. It only has integers of a clearly specified width. A
front-end, then, needs to know the target, so that it knows what
bit-width to use for an int.

Given this problem, there are two solutions. The first is to translate
such concepts using predictable (by the back-end) patterns, such that
the back-end can recognize them. But this solution is unreliable and
we need to be sure that the ABI will be applied perfectly. If, for example, the
back-end fails to identify a class and apply a calling convention, the
program can literally crash. The other solution is basically what LLVM
employs, which is to have every front-end deal with the ABI.

Now, let’s consider the drawbacks of the situation we are facing. The
first and obvious one is that of the engineering effort and code
duplication involved in all these front-ends replicating similar code
just to deal with the ABI. This gave rise to libraries like llvm-abi.

Second, because the back-end is not responsible for the calling
conventions, it actually is agnostic. This means that it will
translate anything given to it, no matter if it agrees with the
conventions. Since it is not the back-end that is responsible, it has
to assume that whoever is, did the job correctly.

Finally, to return to our original problem, because we need different
calling conventions based on the architecture, it means that LLVM IR
is not really target-independent. You cannot generate IR without
knowing the target. Furthermore, can you ship your LLVM IR to a
different target than the one the front-end translated to? No. Can you
combine two IR files generated for different targets? No.

Bonus: source independence

But at least the IR is source-independent right? Right? Not
necessarily. Similarly to our definition of target independence, I
would say that an IR is source independent if it can express any
source and if IR files for different sources can be combined. Given
our discussion above, we can already see that the latter is not
necessarily true. For example, can you combine IRs generated for the
same target but originating from different source languages? Not
necessarily. For example, if there was an ABI based on Rust, then even
if we were compiling to the same target, it wouldn’t necessarily match
with the IR from C.

The former is not a given either. For example, apparently Rust
had a hard time translating to LLVM IR
. One example is that
infinite loops are fine in Rust in cases where they are not fine in
C++. Because the LLVM IR uses the C++ memory model, Rust simply could
not translate to it.

  • Something to keep in mind is that this article is meant to be more an
    abstract intriguing exploration than a showcase of practical problems.
    In particular, it is unclear to me whether these problems bother
    anyone too much. As I mentioned, there are libraries like llvm-abi and a bunch
    of threads in the LLVM mailing list, but I’m not sure anyone really
    cares. The reason is probably that there are just too few ABIs for the
    combinatorics to make a mess.
  • This article was heavily inspired by a discussion I had with Fabian Giesen. In fact,
    it was him who showed me the example with the struct.

Read More