In this cc, I’d like to present a small language based on the lambda calculus
extended with stackful coroutines. I’ll discuss a virtual machine designed for
the language’s execution, and an efficient compilation scheme.

We’ll see that the compilation scheme

  • naturally supports tail-call optimization
  • eliminates all heap allocation of closures
  • eliminates the need for indirect calls

without restriction of the language’s expressiveness – which may be interesting in its own right.

This post is intended for anyone broadly interested in implementing coroutines,
compilers, or virtual machines. I am not an expert in any of those, but I hope
that something in this post may be useful to you. The post will assume basic
familiarity with unification-based type inference and code generation schemes.

First, let you let me show you a playground exemplifying the language and its
compiler/VM. You can hover over values in the input buffer of the playground to
see inferred types. The default program comes from Cyber’s playground,
computing the 20th fibonacci number, and how many times fib was called.


Recently, I’ve been studying the efficient implementation of effect
, which I
currently believe are the most promising avenue for the future of managed
effects in programming languages.

Effect handlers have a lot to do with coroutines – in fact, one way to implement
effect handlers is as stackful coroutines, like OCaml does.

In this cc, I’ll use the terms “coroutine”, “green thread”, and “fiber” all to
refer to the specific idea of first-class, stackful, non-presumptive coroutines.


  • Coroutines are segments of a
    program whose execution can be paused and resumed at-will. They can be thought
    of threads of execution controlled in userspace, rather than in the kernel.

    • If you are familiar with goroutines, coroutines are like goroutines, except

      • The point at which coroutines are preempted must be explicitly defined by
        the programmer; in the language we’ll look at below, via the yield
      • Dually, coroutines do not imply a concurrent runtime; it is up the
        programmer whether to, and how to, execute coroutines in a concurrent
        fashion. In the language we’ll look at below, this is done the spawn and
        resume keywords.
  • A first-class coroutine can be used like any other value in a language,
    unrestricted in its behavior; think of first-class functions.
  • Stackful coroutines can arbitrarily pause their execution, including
    within nested function calls. Stackful coroutines require a preserved call
    stack while they are preserved, in contrast to stackless
    C++ (via co_await/co_yield), Rust (via async/await), and JavaScript
    (via async/await) all implement stackless coroutines – with the
    restriction that you cannot suspend in nested, non-co-routine call in those
    languages. The latter behavior induces the often-named function color

    for async/await.
  • Non-presumptive coroutines are those that cannot be resumed more than
    once. In the language we’ll discuss below, the resume keyword resumes
    execution of a coroutines, and returns a new handle to the coroutine. It is a
    runtime error to call resume on the same handle to a coroutine more than
  • Coroutines are cooperative – unlike threads, which may be preempted
    arbitrarily by an operating systems, coroutines require programmer intervention
    to specify where execution should be paused and resumed (think async/await
    in JavaScript/Python/Rust). Moreoever, coroutines do not imply a concurrent
    runtime, and it is up the programmer to define how to execute coroutines.

    • Again, contrast this to goroutines, where the runtime of goroutines is
      well-defined and suspension points are largely implicit.

Why am I talking about coroutines here?
I’ve seen a few good articles discussing implementing coroutines against a
machine-language target
and recent projects like Cyber and
I’d like to present a small model and implementation of coroutines
against a virtual machine target, which may be a nice base for launching more
experiments without designing a programming language intended for general use.

Co implements coroutines as continuations – which I believe
is also the most promising technique for an efficient implementation of
effect handlers, but I won’t expand on that here. Continuations often go hand-in-hand
with closures, which are typically heap-allocated and called
indirectly. To show off something interesting, we’ll also have our implementation
compile closures unboxed, and eliminate all indirect calls via

Finally, we’ll see that tail-call optimization falls out naturally with other
aspects of the implementation.

Okay, let’s get started!

Source language

Our source language will be the simply-typed lambda calculus extended with
tuples and fibers. Fibers will be our source-language-level representation of
coroutines, and can be used like any other value. Fibers can be spawned,
yielded, and resumed by the author to enable cooperative multitasking.

As an example, let’s consider a program (taken from Cyber’s
) that calculates the 20th
fibonacci number in a fiber that yields to its parent fiber on every iteration.
This usage of yield and resume lets us compute how many times fib
recursively calls itself. I hope you can see how you could run more complicated
tasks concurrently in this style.


let rec fib = n ->


if n < 2

then n

else (fib (n - 1)) + (fib (n - 2))


let rec exec = state ->

stat state.0

| `Pending ->

let fib1 = resume state.0 in

exec {fib1, 0, state.2 + 1}

| `Done n -> {state.0, n, state.2}


let runFib = spawn (fib 20) in

let result = exec {runFib, 0, 0} in

{result.1, result.2}

This language’s syntax is very similar to OCaml’s. I’ll point out a few key
features of its semantics:

  • let rec defines a recursive binding; by default, let bindings are

  • {e1, e2, ..., en} defines a n-ary tuple. Tuple elements are accessed by
    0-based indices; for example, t.0.

  • Every program starts on a unique main fiber. A program terminates when the
    main fiber terminates, even if other fibers have not terminatedcontrast
    this to non-joined kernel threads in e.g. C
    . It’s up to the programmer to
    decide whether or not all child fibers should terminatewhether this is
    a good design design is a different question

  • spawn (fib 20) executes the call fib 20 on a new fiber, which can yield to
    the parent fiber (in this case the main fiber) arbitrarily. spawn returns
    to the calling fiber a handle to the child fiber. These handles have type Fiber.
    Fibers have two states – they are either in the pending state, or have
    terminated with a value.

  • Fibers can be passed around arbitrarily, including to other fibers. Fibers are
    first-class values.

  • yield suspends a fiber’s execution and returns execution to the current
    parent fiber. Note that since fibers can be passed around arbitrarily, the
    parent of a fiber may change during a program’s execution. yield is a
    no-op on the main fiberI believe you could
    detect yields that are only reached by the main fiber with a full-program
    static analysis, but it’s not clear to me whether a more localized analysis
    would be suitable.

  • resume returns execution to a child fiber, which executes until its next
    yield point or completion. Like spawn, resume returns a Fiber.
    It is a runtime to resume the same Fiber twice – in our example above, for
    example, it would not be legal to define


    - let fib1 = resume state.0 in

    - exec {fib1, 0, state.2 + 1}

    + let _ = resume state.0 in

    + exec {state.0, 0, state.2 + 1}

    A affine type system could enforce this requirement at compile-time, but we
    won’t discuss that here.

    Resuming a completed Fiber is effectively a no-op; the Fiber is
    immediately returned as-is.

  • A stat expression queries the state of a Fiber, and matches on a
    `Pending branch if the fiber has not yet terminated, or a
    `Done branch if it has. The `Done binds the
    return value of the fiber to a variable name.

The source language has four types:

  • Integers
  • Booleans
  • Tuples composed of the language types, inductively
  • Fibers composed of the language types, inductively

Virtual machine strategy

There are a lot of ways you could design a virtual machine for the language
described above. Here, I’ll show one design for a stack-based

First off, let’s define the stack part of the stack machine. Suppose that we can
represent all our types as integerstuples are just blocks of integers next to
each other, and I’ll explain the representation of fibers in a bit
; then our
stack is just a dynamically-growing array of integers. For the sake of
simplicity we’ll have the stack deal with 64-bit integers, rather than bytes.
This is wasteful, but that’s okay for this discussion.

We’ll allow a program to arbitrarily push integers onto the stack, pop integers
off the stack, and store integers into arbitrary indices in the stack.

Representing function frames

Since a function in our source language might have local variables referenced
multiple times, it’d be nice to support some idea of local variables in the
runtime representation of a function call frame.

The standard approach here is to keep track of a frame pointer in the
runtime, which points to the place that was the top of the stack when a function
call was enteredTypically, when targeting machine code, frame pointers are
stored in CPU registers.
When a function is entered, it can ask the runtime to reserve some space on the
stack that it will use for storing local variables. Then, those local values
can be accessed relative to an offset from the frame pointer.

Example of compiling a function


let f = x ->

n = 1

m = 2

n + m

can compile to something like

f: sp-add 2 # reserve two cells on the stack, for n and m. # n will live at fp[0], i.e. at the index of the frame pointer # m will live at fp[1] push 1 store-into fp[0] # push, store n=1 push 2 store-into fp[1] # push, store m=2 push fp[0] push fp[1] add ret

Representing function calls

Before calling a function, the caller needs to

  • allocate space for the callee’s return value
  • push on the callee’s argumentsthis includes any captured values, which will be discussed later
  • store the program counter of the caller (since this is global)
  • store the frame pointer of the caller (since this is global, and will be
    modified in the new function frame)

We can have the virtual machine runtime take care of storing the program counter
and frame pointer, and restoring them when a function is returned, but our
compiler will need to take care of the first two line items.

In all, the procedure for calling functions is:

  • Before call entry:

    • allocate space for the return value

    • push on the callee’s arguments

    • push on the caller’s program counter and frame pointer

      arg1 ... argn @caller_pc @caller_fp --- < stack top

  • During call:

    • Set the new program counter and frame pointer for the callee

      arg1 ... argn @caller_pc @caller_fp --- < stack top = fp

  • Upon return:

    • Reset the top of the stack to the callee’s frame pointer.
      That means the caller’s frame pointer is now the top value on the stack.

      arg1 ... argn @caller_pc @caller_fp --- < fp <------- | | --- < reset to --/

    • Pop and restore the caller’s frame pointer and program

      arg1 ... argn < ---------- @caller_pc | @caller_fp | --- < reset to -/

    • The caller deallocates the provided arguments, leaving the return value on
      the top of the stack.

      <------- arg1 | ... | argn < reset to -/

Here’s how a full function call would be represented in the VM.

Representing fibers

In this model, each fiber is identified by a unique stack. Spawning a fiber
allocates a new call stack. The virtual machine needs to bookkeep

  • each unique fiber’s call stack
  • what the current stack of fibers is, relative to the main fiber

Spawning or resuming a fiber pushes onto the stack of active fibers.
Yielding pops the current fiber off the top of the stack, leaving the yielded
fiber’s representation on the parent fiber’s stack, as the return value of the
matching spawn or resume.

Because our fibers cannot be resumed multiple times, we also need to keep track
of a dirty bit in both the runtime representation of a fiber’s stack, and the
Fiber value we return to the parent fiber. When a fiber is resumed, we check
that its dirty bit lines up with the fiber’s dirty bit state; if these values
are out-of-line, we know the fiber has already been resumed and can’t be resumed
again. Note that the dirty bit would not needed if the type system were to enforce that a
fiber can be resumed at most once. It also would not be needed if multi-shot
resumptions were supported; one strategy to enable that is to create a new fiber
on each resumption, and reference-count fiber stacks.

Put all together, we can represent a Fiber that terminates with type T as
the runtime tuple


{ bit: int, value: T, fibidx: int, fibdirty: int }

Where bit=1 when the fiber is done, and 0 otherwise; value is the value the
fiber terminated with, or else undefined; fibidx is the unique reference to
the fiber’s call stack; fibdirty is the fiber’s dirty bit.

The values in the runtime tuple cannot be accessed directly from our source
language. In fact, fibidx and fibdirty are purely implementation details.
bit and value can be accessed conditionally, via the stat keyword.

Representing tuples

Speaking of stat, it’s worth noting how tuples (including Fibers) are
represented on the stack. The directionality is somewhat arbitrary, but here
we’ll have lower-indexed elements in a tuple stored at the top of the stack. For
example, a Fiber<{int, int}> will be stored as

fiber.fibdirty fiber.fibidx fiber.value.1 fiber.value.0 fiber.bit < stack top

This has the advantage of making stat‘s job easier – once a fiber is loaded
onto the top of the stack, the pending/done bit can be immediately popped off,
and if needed, the terminating value is right on top of the stack.

Instruction set

All that described, we can construct an instruction set for our virtual machine.


type op =

| Eq

| Lt

| Sub

| Add

| Mul

| Yield

| Spawn of { proc : label; args_size : int; ret_size : int }

| Resume of int (** size of return value *)

| Push of locator

| Store of int (** store-into fp[offset] *)

| SpAdd of int

| SpSub of int

| SpRestoreFp (** restore to the frame pointer. Used for tail calls *)

| Jmp of label

| Jmpz of label

| Jmprel1 (** jump relative to 1 + the integer on the top of the stack. *)

| Call of label

| Ret

type basic_block = label * op list

type proc = {

name : label;

blocks : basic_block list;

debug_frame : debug_frame;


Instructions are organized into basic blocks,
which are identified by a label and contain zero or more instructions that are
executed in a linear fashion. A basic block only branches out via a jump or
return at its exit.
Procedures consist of a name and one or more basic blocks.

Most of the bytecode instructions are standard, though there are a few
interesting ones:

  • Along with taking the procedure to call, Spawn requires the size of the
    procedure’s arguments. That’s because the arguments will need to be copied
    over from the parent stack into the stack allocated for the child fiber.
  • Both Spawn and Resume require the size of the fiber’s termination value,
    because upon yield or termination, the runtime will need to know how
    large to make the Fiber value representation.
  • SpAdd allocates space on the stack (for local arguments and return values),
    SpSub deallocates space on the stack (so callers can reclaim space used
    for call argument), and SpRestoreFp restores the top of the stack to the
    current frame pointer. We’ll see that the latter instruction is useful for
    implementing tail call elimination; it is otherwise unnecessary, since such
    restoration happens by the runtime during a return.
  • Call requires a named procedure, and there is no support for indirect calls.
    This is not a joke, and there is no restiction here – we’ll see below how to
    compile co_lc, unrestricted in the kinds of closures you can create,
    with only direct calls.

Compilation strategy

Type inference

There are a few type systems co_lc could use; here we use the type system of
the simply-typed lambda calculus, with complete type inference via unification.
Here’s one implementation of co_lc inference algorithm;
I won’t go into depth about it here, but the approach is pretty standard.

A tip for inference of tuples

One thing that may be interesting, which I don’t see typically covered in
discussions of unification-based inference, is how to deal with the inference of
tuples. The question has to do with an expression like


let f = t ->

t.1 == 10

Clearly, t is a tuple with an arity of at least two, where the second element
is an int. But the precise arity of the tuple, and its other elements’ types,
cannot be known at this point.

One solution is to have both dense representations of tuple types (for when
a tuple literal is seen), and a sparse representation for cases like the above.
That is, our type definition might look like


type ty =


| TupleDense of ty list (* list of element types *)

| TupleSparse of (int * ty) list (* list of known indices' types *)


When a TupleSparse unifies with a TupleDense, its respective elements (and
arities) must unify, and then the TupleSparse must become a TupleDense,
since that is the point of the concrete type’s instantiation.

Eliminating indirect calls and heap-allocated closures via defunctionalization

I did promise no indirect function calls, and I must deliver. We’ll use
type-directed defunctionalization, via lambda sets, to guarantee that
all calls can be compiled directlyDefunctionalization (and devirtualization) are typically done as optimizations in later stages of a compiler; however, by embedding call information in the type system, we can guarantee the optimization is applied.,
with conditional dispatch via a jump table rather than a function pointer.

Lambda sets are due to William Brandon,, unpublished manuscript. Lambda
sets are also used in Roc to compile closures
without indirect calls or heap allocation.

Why does eliminating indirect calls matter?

An indirect call occurs when a machine has to load the address to call at
runtime, rather than it being statically known.

Function pointers

In our source language, it’s reasonable to write something like


let caller = f -> f 1 in

let id = x -> x in

caller id

A direct compilation of this code might produce bytecode like

id: ... caller: # suppose the argument is at fp[-3] push 1 push fp[-3] call-indirect ret main: push &id call caller ret

where the address of the function passed to caller has to be loaded, popped,
and dereferenced at runtime in order for call-indirect to determine what
function should ultimately be called. It’s clear that in a program like the one
above, a smarter analysis could reduce the compiled code to

id: ... caller: push 1 call id ret main: call caller ret

The advantages of the latter include

  • The call site is statically known, enabling further compiler analyses and
    optimizations like inlining id in caller.
  • If this were compiled to target a CPU, there would be no reliance on the CPU’s
    branch target predictor at runtime.

The biggest downside of this approach I’m aware of is that if caller is called
with multiple times with unique functions,

  • a unique copy of caller must be stamped out for each unique argument (monomorphization, increasing code size)
  • or, direct calls can be used, but must be guarded by a jump table

Speaking of jump tables –

Conditionally-determined functions

A second case has to do with the representation of conditionally-determined
functions. Let’s extend our previous example to


let caller = f -> f 1 in

let id = x -> x in

let mul2 = x -> x * 2

caller (if 0 == 0 then id else mul2)

Again, a direct compilation of this code might produce bytecode like

id: ... caller: # suppose the argument is at fp[-3] push 1 push fp[-3] call-indirect main: push 0 ifz jmp else then: push &id call caller ret else: push &mul call caller ret

As before, this compilation determines a function pointer to dispatch to, and
passes that into caller. An alternative is to push down the conditional into
the body of caller, enabling a direct call within each branch of the

id: ... caller: push 1 # suppose the argument is at fp[-3] push fp[-3] ifz jmp call_mul_ call_id_: call id ret call_mul_: call mul ret main: push 0 call caller ret

The advantage here is that we’ve eliminated one source of indirection – we still
need a conditional, but we’ve gotten rid of function pointers. This
may make inline analyses more fruitful, and places less pressure on a CPU’s
branch target predictor.

One thing I’m not sure about is the efficacy of the jump-table-based caller
implementation relative to the function-pointer-based one, when caller‘s argument is static. In
that case, the main performance consideration would be

  • for the function-pointer-based caller, how long a dereference takes, and how well
    branch target prediction works on a CPU
  • for the jump-table-based caller, how long a conditional and jump takes, and how well
    branch prediction works on a CPU

I am not aware of any empirical studies on the efficacy of CPU’s branch predictors
vs their branch target predictorsThough,
there is this nice analysis by Cloudflare.

please let me know if you know of any!

Eliminating heap-allocated captures

Lambda sets also enable a natural way to eliminate heap
allocation of closure captures. This is pretty nice, especially in functional
paradigms where closures are common.

Let’s consider the program

where f returns a function that either captures x, x and y, or x and
y and z (notice that f itself must capture all three). At runtime, the
captures must live alongside the representation of the function to call – which
means that the runtime representation of the captures must also be the same
size, no matter what function is returnedBecause the return
value of f can be used indiscriminately, including being passed around to arbitrary
locations, it must have a statically-known size.

The typical way this requirement of uniformityOf course, one
alternative is to disallow the conditional selection of a closure. C++ and Rust
enforce this restriction by giving each closure a unique type, which is
incompatible with every other closure type.

is achieved is by compiling closures to a representation like the following C struct


struct closure {

void* fn_addr;

void* captures;


where each capturing function takes an opaque captures pointer as an argument,
which it can then unpack to the concrete type of captures it knows.

For the function f returns, this is needlessly generic – f‘s returned closure
can have the stricter representation


struct f_closure {

int tag; // 1, 2, or 3

struct f_closure_captures captures;


union f_closure_captures {

struct captures1;

struct captures2;

struct captures3;


struct captures1 {

int x;


struct captures2 {

int x; int y;


struct captures3 {

int x; int y; int z;


That is, we can keep a tag of three states telling us which function to dispatch
to (this is the elimination of function pointers discussed in previous
sections). And, we can reduce the representation of captures to a three-state
union – which can all be stored on the stack, at the expense of paying for the
largest captures size!

As before, we are making tradeoffs here. While this helps avoid heap-allocation,
it means that

  • The smallest captures in a representation like the above take as much space on
    the stack as the largest captures, which can be exceptionally wasteful if
    the size discrepancy is large.
  • This scheme makes incremental and separate compilation more challenging, as
    the addition of a new function can change the memory representation of
    values far away.

Inference and compilation semantics of lambda sets

Actually, the f_closure struct representation I gave above is exactly where
lambda sets take us! If you already see how to get there, feel free to skip this

I must again say that the original idea of lambda sets in this type-directed
form is due to William Brandon, in an unpublished manuscript. We’ve also
used them in Roc, and learned a lot about their behavior (many potential blog
posts about that!).

The idea is

  • Each syntactic function (a lambda x -> ...) gets a unique name

  • When inferring the type of a function, include the name of that function and
    its set of captures in a set adjacent to the function type; this is the
    lambda set.
    For example,


    x = 1

    y = 1

    f = n -> n + x + y

    may be given type int -[f {x: int, y: int}]-> int, where [f {x: int, y: int}] is the lambda set.

  • When two functions unify, their lambda sets union.

Take a look at the inferred types of our example from above. Feel free to
hover over f, or look at the elaborated output.

Notice that the inferred type is saying that f returns a function of type int -> int whose lambda set is [lam {x} | lam1 {x, y} | lam2 {x, y, z}] – exactly
the three functions that we might dispatch to, and what their captures are! This
is all we need in order to compile the return value of f to the
f_closure-like representation described previously.

Compiling functions, including lambda sets

Put all together, our compilation scheme for functions consists of
  • When compiling a function (e.g. lam2 of the example above)
    • Determine the stack size of its captures based on the largest captures of
      any lambda in the lambda set it’s involved in.

      • Compile the function to unpack the passed captures appropriately.
      • For the runtime representation of the function-to-call, store the captures on the
    • If the lambda set of the function is non-unary, the function to call must be
      determined conditionally; store a tag for this function on the top of the
      stack (e.g. 2, since lam2 appears in index 2 of the set).
  • When calling a function
    • Load the argument, then the function. That way, the stack consists of the
      argument, followed by any captures, followed by optionally the function
      tag on the top of the stack.

      < stack top

    • If the lambda set is unary, there is only one function that can be called –
      compile the direct call.

    • If the lambda set is non-unary, build a jump table (one implementation)
      using the integer tag of the lambda on the top of the stack. Each branch of
      the jump table can perform the appropriate direct call to the matched

Tip: deciding where to store values

I don’t know if this is well-known, but one thing I find helpful during
compilation is to have the procedure that compiles an expression take a
parameter indicating where the expression should be compiled, rather than
(in this case) always compiling to the stack and storing the result elsewhere
later. This eliminates a lot of trivially-reducable load and stores.

For example, my implementation
has the recursive bytecode compiler take an optional target destination, which
looks like


type opt_target =

[ `Any

| `FpOffset of int (** store as a local offset from the frame pointer *)

| `Stack (** store on the top of the stack *) ]

Sometimes, the target truly can be arbitrary. For example, when compiling a
tuple access
we don’t really care if the tuple is loaded onto the top of stack or is in an
offset from the frame pointer – in the former case we can pop values off until
we get to the field we’re looking for, and in the latter case, the access can be
made relative to where the tuple lives.

To facilitate the arbitrary-target case, the procedure also returns where
exactly the expression ended up being stored, as a type that’s opt_target
without the Any variant.

Tail-call optimization

Passing an optional target makes tail-call optimization very natural – when
we’re compiling a function, we set up the target of the function body to a
special return local, which is the location where the caller allocated space
for the return value.

During the compilation of a function, if we see a call to the same function, and
whose target is the return local, we know that we can perform a tail-call
optimization! Thanks to lambda sets, the destination of a function call is in the function
type, so only the type of the function being applied needs to be examined to
understand whether the optimization can be made.

To actually perform the optimization, compile the call argument into the
location of the argument in the current function frame, reset the stack pointer
to the frame pointer, and jump back up to the top of the function. See, I
promised a use of SpRestoreFp!

Unexplored extensions

Thanks for making it this far! You must be pretty interested in this kind of
stuff. There are various natural extensions to this project, which may make for
interesting projects:

  • Make these fibers do something interesting. Our language and compiler is
    enough to show off that the implementation of fibers works, but doesn’t
    actually demonstrate its use in an interesting one. One idea would be to add
    support for making non-blocking network requests and show an example of
    cooperative multitasking where multiple fibers fetch many webpages.
  • Implement an affine type system to enforce the one-shot behavior of our fibers
    at compile-time.
  • Design and implement an analysis to detect what spawned expressions will never
    yield. Those expressions can be optimized to happen as direct evaluations,
    rather than needing to happen on a child fiber.
  • Design an efficient scheme for multi-resumable fibers.
  • Design a system for implicitly-yielding userland threads, and a preempting
    runtime scheduler of userland threads. Modify the VM and compiler to support
    this system.
  • Implement additional optimizations over the virtual machine, and optimizations
    of the bytecode in the compiler.
  • My virtual machine is implemented in OCaml (and compiled to JavaScript via JSOO).
    Using a tighter language, like C/Rust/Zig/whatever, for the VM (and compiling
    to WASM) may result in a much faster machine.

Playground & Implementation

I hope you enjoyed this post, and learned something. You can find my implementation
of the language compiler and VM on GitHub.
As you saw previously in this post, I’ve also made a playground for the
language, embdded again below.

Ad: Are you interested in compiling structural subtypes?

If you know of ways, or are interested in, compiling languages based on
structural subtyping to non-uniform representations,
please email me! This has been an ongoing project of mine
and I hope to write about it soon, and would love to chat through with others

In particular, the idea here is to combine

  • the programming flexibility of structural subtyping
  • (non-principal, possibly incomplete) type inference, a-la MLsub
  • row polymorphism and polymorphic variants

to design subtyping-based source languages with compilation to target languages

  • do not support runtime object polymorphism (i.e. all procedures are monomorphized)
  • have non-uniform, unboxed representations
  • do not require introducing implicit conversions at re-binding or usage sites

Read More