- Introduction
- Source language
- Virtual machine strategy
- Compilation strategy
- Unexplored extensions
- Playground & Implementation
- Ad: Are you interested in compiling structural subtypes?
In this cc, I’d like to present a small language based on the lambda calculus
extended with stackful coroutines. I’ll discuss a virtual machine designed for
the language’s execution, and an efficient compilation scheme.
We’ll see that the compilation scheme
- naturally supports tail-call optimization
- eliminates all heap allocation of closures
- eliminates the need for indirect calls
without restriction of the language’s expressiveness – which may be interesting in its own right.
This post is intended for anyone broadly interested in implementing coroutines,
compilers, or virtual machines. I am not an expert in any of those, but I hope
that something in this post may be useful to you. The post will assume basic
familiarity with unification-based type inference and code generation schemes.
First, let you let me show you a playground exemplifying the language and its
compiler/VM. You can hover over values in the input buffer of the playground to
see inferred types. The default program comes from Cyber’s playground,
computing the 20th fibonacci number, and how many times fib
was called.
Introduction
Recently, I’ve been studying the efficient implementation of effect
handler, which I
currently believe are the most promising avenue for the future of managed
effects in programming languages.
Effect handlers have a lot to do with coroutines – in fact, one way to implement
effect handlers is as stackful coroutines, like OCaml does.
In this cc, I’ll use the terms “coroutine”, “green thread”, and “fiber” all to
refer to the specific idea of first-class, stackful, non-presumptive coroutines.
Specifically,
- Coroutines are segments of a
program whose execution can be paused and resumed at-will. They can be thought
of threads of execution controlled in userspace, rather than in the kernel.- If you are familiar with goroutines, coroutines are like goroutines, except
that- The point at which coroutines are preempted must be explicitly defined by
the programmer; in the language we’ll look at below, via theyield
keyword. - Dually, coroutines do not imply a concurrent runtime; it is up the
programmer whether to, and how to, execute coroutines in a concurrent
fashion. In the language we’ll look at below, this is done thespawn
and
resume
keywords.
- The point at which coroutines are preempted must be explicitly defined by
- If you are familiar with goroutines, coroutines are like goroutines, except
- A first-class coroutine can be used like any other value in a language,
unrestricted in its behavior; think of first-class functions. - Stackful coroutines can arbitrarily pause their execution, including
within nested function calls. Stackful coroutines require a preserved call
stack while they are preserved, in contrast to stackless
coroutines.
C++ (viaco_await
/co_yield
), Rust (viaasync
/await
), and JavaScript
(viaasync
/await
) all implement stackless coroutines – with the
restriction that you cannot suspend in nested, non-co-routine call in those
languages. The latter behavior induces the often-named function color
problem
for async/await. - Non-presumptive coroutines are those that cannot be resumed more than
once. In the language we’ll discuss below, theresume
keyword resumes
execution of a coroutines, and returns a new handle to the coroutine. It is a
runtime error to callresume
on the same handle to a coroutine more than
once. - Coroutines are cooperative – unlike threads, which may be preempted
arbitrarily by an operating systems, coroutines require programmer intervention
to specify where execution should be paused and resumed (thinkasync
/await
in JavaScript/Python/Rust). Moreoever, coroutines do not imply a concurrent
runtime, and it is up the programmer to define how to execute coroutines.- Again, contrast this to goroutines, where the runtime of goroutines is
well-defined and suspension points are largely implicit.
- Again, contrast this to goroutines, where the runtime of goroutines is
Why am I talking about coroutines here?
I’ve seen a few good articles discussing implementing coroutines against a
machine-language target
(1,
2),
and recent projects like Cyber and
Co.
I’d like to present a small model and implementation of coroutines
against a virtual machine target, which may be a nice base for launching more
experiments without designing a programming language intended for general use.
Co implements coroutines as continuations – which I believe
is also the most promising technique for an efficient implementation of
effect handlers, but I won’t expand on that here. Continuations often go hand-in-hand
with closures, which are typically heap-allocated and called
indirectly. To show off something interesting, we’ll also have our implementation
compile closures unboxed, and eliminate all indirect calls via
defunctionalization.
Finally, we’ll see that tail-call optimization falls out naturally with other
aspects of the implementation.
Okay, let’s get started!
Source language
Our source language will be the simply-typed lambda calculus extended with
tuples and fibers. Fibers will be our source-language-level representation of
coroutines, and can be used like any other value. Fibers can be spawn
ed,
yield
ed, and resume
d by the author to enable cooperative multitasking.
As an example, let’s consider a program (taken from Cyber’s
examples) that calculates the 20th
fibonacci number in a fiber that yields to its parent fiber on every iteration.
This usage of yield
and resume
lets us compute how many times fib
recursively calls itself. I hope you can see how you could run more complicated
tasks concurrently in this style.
ocaml
let rec fib = n ->
yield;
if n < 2
then n
else (fib (n - 1)) + (fib (n - 2))
in
let rec exec = state ->
stat state.0
| `Pending ->
let fib1 = resume state.0 in
exec {fib1, 0, state.2 + 1}
| `Done n -> {state.0, n, state.2}
in
let runFib = spawn (fib 20) in
let result = exec {runFib, 0, 0} in
{result.1, result.2}
This language’s syntax is very similar to OCaml’s. I’ll point out a few key
features of its semantics:
-
let rec
defines a recursive binding; by default,let
bindings are
non-recursive. -
{e1, e2, ..., en}
defines a n-ary tuple. Tuple elements are accessed by
0-based indices; for example,t.0
. -
Every program starts on a unique
main
fiber. A program terminates when the
main
fiber terminates, even if other fibers have not terminatedcontrast
this to non-joined kernel threads in e.g. C. It’s up to the programmer to
decide whether or not all child fibers should terminatewhether this is
a good design design is a different question. -
spawn (fib 20)
executes the callfib 20
on a new fiber, which can yield to
the parent fiber (in this case themain
fiber) arbitrarily.spawn
returns
to the calling fiber a handle to the child fiber. These handles have typeFiber
.
Fibers have two states – they are either in the pending state, or have
terminated with a value. -
Fibers can be passed around arbitrarily, including to other fibers. Fibers are
first-class values. -
yield
suspends a fiber’s execution and returns execution to the current
parent fiber. Note that since fibers can be passed around arbitrarily, the
parent of a fiber may change during a program’s execution.yield
is a
no-op on the main fiberI believe you could
detectyield
s that are only reached by the main fiber with a full-program
static analysis, but it’s not clear to me whether a more localized analysis
would be suitable.. -
resume
returns execution to a child fiber, which executes until its next
yield
point or completion. Likespawn
,resume
returns aFiber
.
It is a runtime to resume the sameFiber
twice – in our example above, for
example, it would not be legal to definediff
- let fib1 = resume state.0 in
- exec {fib1, 0, state.2 + 1}
+ let _ = resume state.0 in
+ exec {state.0, 0, state.2 + 1}
A affine type system could enforce this requirement at compile-time, but we
won’t discuss that here.Resuming a completed
Fiber
is effectively a no-op; theFiber
is
immediately returned as-is. -
A
stat
expression queries the state of aFiber
, and matches on a
`Pending
branch if the fiber has not yet terminated, or a
`Done
branch if it has. The`Done
binds the
return value of the fiber to a variable name.
The source language has four types:
- Integers
- Booleans
- Tuples composed of the language types, inductively
- Fibers composed of the language types, inductively
Virtual machine strategy
There are a lot of ways you could design a virtual machine for the language
described above. Here, I’ll show one design for a stack-based
machine.
First off, let’s define the stack part of the stack machine. Suppose that we can
represent all our types as integerstuples are just blocks of integers next to
each other, and I’ll explain the representation of fibers in a bit; then our
stack is just a dynamically-growing array of integers. For the sake of
simplicity we’ll have the stack deal with 64-bit integers, rather than bytes.
This is wasteful, but that’s okay for this discussion.
We’ll allow a program to arbitrarily push integers onto the stack, pop integers
off the stack, and store integers into arbitrary indices in the stack.
Representing function frames
Since a function in our source language might have local variables referenced
multiple times, it’d be nice to support some idea of local variables in the
runtime representation of a function call frame.
The standard approach here is to keep track of a frame pointer in the
runtime, which points to the place that was the top of the stack when a function
call was enteredTypically, when targeting machine code, frame pointers are
stored in CPU registers..
When a function is entered, it can ask the runtime to reserve some space on the
stack that it will use for storing local variables. Then, those local values
can be accessed relative to an offset from the frame pointer.
Example of compiling a function
ocaml
let f = x ->
n = 1
m = 2
n + m
can compile to something like
f: sp-add 2 # reserve two cells on the stack, for n and m. # n will live at fp[0], i.e. at the index of the frame pointer # m will live at fp[1] push 1 store-into fp[0] # push, store n=1 push 2 store-into fp[1] # push, store m=2 push fp[0] push fp[1] add ret
Representing function calls
Before calling a function, the caller needs to
- allocate space for the callee’s return value
- push on the callee’s argumentsthis includes any captured values, which will be discussed later
- store the program counter of the caller (since this is global)
- store the frame pointer of the caller (since this is global, and will be
modified in the new function frame)
We can have the virtual machine runtime take care of storing the program counter
and frame pointer, and restoring them when a function is returned, but our
compiler will need to take care of the first two line items.
In all, the procedure for calling functions is:
-
Before call entry:
-
allocate space for the return value
-
push on the callee’s arguments
-
push on the caller’s program counter and frame pointer
arg1 ... argn @caller_pc @caller_fp --- < stack top
-
-
During call:
-
Set the new program counter and frame pointer for the callee
arg1 ... argn @caller_pc @caller_fp --- < stack top = fp
-
-
Upon return:
-
Reset the top of the stack to the callee’s frame pointer.
That means the caller’s frame pointer is now the top value on the stack.arg1 ... argn @caller_pc @caller_fp --- < fp <------- | | --- < reset to --/ -
Pop and restore the caller’s frame pointer and program
counter.arg1 ... argn < ---------- @caller_pc | @caller_fp | --- < reset to -/ -
The caller deallocates the provided arguments, leaving the return value on
the top of the stack.<------- arg1 | ... | argn < reset to -/
-
Here’s how a full function call would be represented in the VM.
Representing fibers
In this model, each fiber is identified by a unique stack. Spawning a fiber
allocates a new call stack. The virtual machine needs to bookkeep
- each unique fiber’s call stack
- what the current stack of fibers is, relative to the main fiber
Spawning or resuming a fiber pushes onto the stack of active fibers.
Yielding pops the current fiber off the top of the stack, leaving the yielded
fiber’s representation on the parent fiber’s stack, as the return value of the
matching spawn
or resume
.
Because our fibers cannot be resumed multiple times, we also need to keep track
of a dirty bit in both the runtime representation of a fiber’s stack, and the
Fiber
value we return to the parent fiber. When a fiber is resumed, we check
that its dirty bit lines up with the fiber’s dirty bit state; if these values
are out-of-line, we know the fiber has already been resumed and can’t be resumed
again. Note that the dirty bit would not needed if the type system were to enforce that a
fiber can be resumed at most once. It also would not be needed if multi-shot
resumptions were supported; one strategy to enable that is to create a new fiber
on each resumption, and reference-count fiber stacks.
Put all together, we can represent a Fiber
that terminates with type T
as
the runtime tuple
ocaml
{ bit: int, value: T, fibidx: int, fibdirty: int }
Where bit=1
when the fiber is done, and 0 otherwise; value
is the value the
fiber terminated with, or else undefined; fibidx
is the unique reference to
the fiber’s call stack; fibdirty
is the fiber’s dirty bit.
The values in the runtime tuple cannot be accessed directly from our source
language. In fact, fibidx
and fibdirty
are purely implementation details.
bit
and value
can be accessed conditionally, via the stat
keyword.
Representing tuples
Speaking of stat
, it’s worth noting how tuples (including Fiber
s) are
represented on the stack. The directionality is somewhat arbitrary, but here
we’ll have lower-indexed elements in a tuple stored at the top of the stack. For
example, a Fiber<{int, int}>
will be stored as
fiber.fibdirty fiber.fibidx fiber.value.1 fiber.value.0 fiber.bit < stack top
This has the advantage of making stat
‘s job easier – once a fiber is loaded
onto the top of the stack, the pending/done bit can be immediately popped off,
and if needed, the terminating value is right on top of the stack.
Instruction set
All that described, we can construct an instruction set for our virtual machine.
ocaml
type op =
| Eq
| Lt
| Sub
| Add
| Mul
| Yield
| Spawn of { proc : label; args_size : int; ret_size : int }
| Resume of int (** size of return value *)
| Push of locator
| Store of int (** store-into fp[offset] *)
| SpAdd of int
| SpSub of int
| SpRestoreFp (** restore to the frame pointer. Used for tail calls *)
| Jmp of label
| Jmpz of label
| Jmprel1 (** jump relative to 1 + the integer on the top of the stack. *)
| Call of label
| Ret
type basic_block = label * op list
type proc = {
name : label;
blocks : basic_block list;
debug_frame : debug_frame;
}
Instructions are organized into basic blocks,
which are identified by a label and contain zero or more instructions that are
executed in a linear fashion. A basic block only branches out via a jump or
return at its exit.
Procedures consist of a name and one or more basic blocks.
Most of the bytecode instructions are standard, though there are a few
interesting ones:
- Along with taking the procedure to call,
Spawn
requires the size of the
procedure’s arguments. That’s because the arguments will need to be copied
over from the parent stack into the stack allocated for the child fiber. - Both
Spawn
andResume
require the size of the fiber’s termination value,
because upon yield or termination, the runtime will need to know how
large to make theFiber
value representation. SpAdd
allocates space on the stack (for local arguments and return values),
SpSub
deallocates space on the stack (so callers can reclaim space used
for call argument), andSpRestoreFp
restores the top of the stack to the
current frame pointer. We’ll see that the latter instruction is useful for
implementing tail call elimination; it is otherwise unnecessary, since such
restoration happens by the runtime during a return.Call
requires a named procedure, and there is no support for indirect calls.
This is not a joke, and there is no restiction here – we’ll see below how to
compileco_lc
, unrestricted in the kinds of closures you can create,
with only direct calls.
Compilation strategy
Type inference
There are a few type systems co_lc
could use; here we use the type system of
the simply-typed lambda calculus, with complete type inference via unification.
Here’s one implementation of co_lc
inference algorithm;
I won’t go into depth about it here, but the approach is pretty standard.
A tip for inference of tuples
One thing that may be interesting, which I don’t see typically covered in
discussions of unification-based inference, is how to deal with the inference of
tuples. The question has to do with an expression like
ocaml
let f = t ->
t.1 == 10
Clearly, t
is a tuple with an arity of at least two, where the second element
is an int. But the precise arity of the tuple, and its other elements’ types,
cannot be known at this point.
One solution is to have both dense representations of tuple types (for when
a tuple literal is seen), and a sparse representation for cases like the above.
That is, our type definition might look like
ocaml
type ty =
...
| TupleDense of ty list (* list of element types *)
| TupleSparse of (int * ty) list (* list of known indices' types *)
...
When a TupleSparse
unifies with a TupleDense
, its respective elements (and
arities) must unify, and then the TupleSparse
must become a TupleDense
,
since that is the point of the concrete type’s instantiation.
Eliminating indirect calls and heap-allocated closures via defunctionalization
I did promise no indirect function calls, and I must deliver. We’ll use
type-directed defunctionalization, via lambda sets, to guarantee that
all calls can be compiled directlyDefunctionalization (and devirtualization) are typically done as optimizations in later stages of a compiler; however, by embedding call information in the type system, we can guarantee the optimization is applied.,
with conditional dispatch via a jump table rather than a function pointer.
Lambda sets are due to William Brandon, et.al., unpublished manuscript. Lambda
sets are also used in Roc to compile closures
without indirect calls or heap allocation.
Why does eliminating indirect calls matter?
An indirect call occurs when a machine has to load the address to call at
runtime, rather than it being statically known.
Function pointers
In our source language, it’s reasonable to write something like
ocaml
let caller = f -> f 1 in
let id = x -> x in
caller id
A direct compilation of this code might produce bytecode like
id: ... caller: # suppose the argument is at fp[-3] push 1 push fp[-3] call-indirect ret main: push &id call caller ret
where the address of the function passed to caller
has to be loaded, popped,
and dereferenced at runtime in order for call-indirect
to determine what
function should ultimately be called. It’s clear that in a program like the one
above, a smarter analysis could reduce the compiled code to
id: ... caller: push 1 call id ret main: call caller ret
The advantages of the latter include
- The call site is statically known, enabling further compiler analyses and
optimizations like inliningid
incaller
. - If this were compiled to target a CPU, there would be no reliance on the CPU’s
branch target predictor at runtime.
The biggest downside of this approach I’m aware of is that if caller
is called
with multiple times with unique functions,
- a unique copy of
caller
must be stamped out for each unique argument (monomorphization, increasing code size) - or, direct calls can be used, but must be guarded by a jump table
Speaking of jump tables –
Conditionally-determined functions
A second case has to do with the representation of conditionally-determined
functions. Let’s extend our previous example to
ocaml
let caller = f -> f 1 in
let id = x -> x in
let mul2 = x -> x * 2
caller (if 0 == 0 then id else mul2)
Again, a direct compilation of this code might produce bytecode like
id: ... caller: # suppose the argument is at fp[-3] push 1 push fp[-3] call-indirect main: push 0 ifz jmp else then: push &id call caller ret else: push &mul call caller ret
As before, this compilation determines a function pointer to dispatch to, and
passes that into caller
. An alternative is to push down the conditional into
the body of caller
, enabling a direct call within each branch of the
conditional.
id: ... caller: push 1 # suppose the argument is at fp[-3] push fp[-3] ifz jmp call_mul_ call_id_: call id ret call_mul_: call mul ret main: push 0 call caller ret
The advantage here is that we’ve eliminated one source of indirection – we still
need a conditional, but we’ve gotten rid of function pointers. This
may make inline analyses more fruitful, and places less pressure on a CPU’s
branch target predictor.
One thing I’m not sure about is the efficacy of the jump-table-based caller
implementation relative to the function-pointer-based one, when caller
‘s argument is static. In
that case, the main performance consideration would be
- for the function-pointer-based
caller
, how long a dereference takes, and how well
branch target prediction works on a CPU - for the jump-table-based
caller
, how long a conditional and jump takes, and how well
branch prediction works on a CPU
I am not aware of any empirical studies on the efficacy of CPU’s branch predictors
vs their branch target predictorsThough,
there is this nice analysis by Cloudflare. –
please let me know if you know of any!
Eliminating heap-allocated captures
Lambda sets also enable a natural way to eliminate heap
allocation of closure captures. This is pretty nice, especially in functional
paradigms where closures are common.
Let’s consider the program
where f
returns a function that either captures x
, x
and y
, or x
and
y
and z
(notice that f
itself must capture all three). At runtime, the
captures must live alongside the representation of the function to call – which
means that the runtime representation of the captures must also be the same
size, no matter what function is returnedBecause the return
value of f
can be used indiscriminately, including being passed around to arbitrary
locations, it must have a statically-known size..
The typical way this requirement of uniformityOf course, one
alternative is to disallow the conditional selection of a closure. C++ and Rust
enforce this restriction by giving each closure a unique type, which is
incompatible with every other closure type.
is achieved is by compiling closures to a representation like the following C struct
c
struct closure {
void* fn_addr;
void* captures;
};
where each capturing function takes an opaque captures
pointer as an argument,
which it can then unpack to the concrete type of captures it knows.
For the function f
returns, this is needlessly generic – f
‘s returned closure
can have the stricter representation
c
struct f_closure {
int tag; // 1, 2, or 3
struct f_closure_captures captures;
};
union f_closure_captures {
struct captures1;
struct captures2;
struct captures3;
};
struct captures1 {
int x;
};
struct captures2 {
int x; int y;
};
struct captures3 {
int x; int y; int z;
};
That is, we can keep a tag of three states telling us which function to dispatch
to (this is the elimination of function pointers discussed in previous
sections). And, we can reduce the representation of captures to a three-state
union – which can all be stored on the stack, at the expense of paying for the
largest captures size!
As before, we are making tradeoffs here. While this helps avoid heap-allocation,
it means that
- The smallest captures in a representation like the above take as much space on
the stack as the largest captures, which can be exceptionally wasteful if
the size discrepancy is large. - This scheme makes incremental and separate compilation more challenging, as
the addition of a new function can change the memory representation of
values far away.
Inference and compilation semantics of lambda sets
Actually, the f_closure
struct representation I gave above is exactly where
lambda sets take us! If you already see how to get there, feel free to skip this
section.
I must again say that the original idea of lambda sets in this type-directed
form is due to William Brandon, et.al. in an unpublished manuscript. We’ve also
used them in Roc, and learned a lot about their behavior (many potential blog
posts about that!).
The idea is
-
Each syntactic function (a lambda
x -> ...
) gets a unique name -
When inferring the type of a function, include the name of that function and
its set of captures in a set adjacent to the function type; this is the
lambda set.
For example,ocaml
x = 1
y = 1
f = n -> n + x + y
may be given type
int -[f {x: int, y: int}]-> int
, where[f {x: int, y: int}]
is the lambda set. -
When two functions unify, their lambda sets union.
Take a look at the inferred types of our example from above. Feel free to
hover over f
, or look at the elaborated output.
Notice that the inferred type is saying that f
returns a function of type int -> int
whose lambda set is [lam {x} | lam1 {x, y} | lam2 {x, y, z}]
– exactly
the three functions that we might dispatch to, and what their captures are! This
is all we need in order to compile the return value of f
to the
f_closure
-like representation described previously.
Compiling functions, including lambda sets
Put all together, our compilation scheme for functions consists of
- When compiling a function (e.g.
lam2
of the example above)- Determine the stack size of its captures based on the largest captures of
any lambda in the lambda set it’s involved in.- Compile the function to unpack the passed captures appropriately.
- For the runtime representation of the function-to-call, store the captures on the
stack.
- If the lambda set of the function is non-unary, the function to call must be
determined conditionally; store a tag for this function on the top of the
stack (e.g.2
, sincelam2
appears in index 2 of the set).
- Determine the stack size of its captures based on the largest captures of
- When calling a function
-
Load the argument, then the function. That way, the stack consists of the
argument, followed by any captures, followed by optionally the function
tag on the top of the stack.< stack top -
If the lambda set is unary, there is only one function that can be called –
compile the direct call. -
If the lambda set is non-unary, build a jump table (one implementation)
using the integer tag of the lambda on the top of the stack. Each branch of
the jump table can perform the appropriate direct call to the matched
function.
-
Tip: deciding where to store values
I don’t know if this is well-known, but one thing I find helpful during
compilation is to have the procedure that compiles an expression take a
parameter indicating where the expression should be compiled, rather than
(in this case) always compiling to the stack and storing the result elsewhere
later. This eliminates a lot of trivially-reducable load and stores.
For example, my implementation
has the recursive bytecode compiler take an optional target destination, which
looks like
ocaml
type opt_target =
[ `Any
| `FpOffset of int (** store as a local offset from the frame pointer *)
| `Stack (** store on the top of the stack *) ]
Sometimes, the target truly can be arbitrary. For example, when compiling a
tuple access,
we don’t really care if the tuple is loaded onto the top of stack or is in an
offset from the frame pointer – in the former case we can pop values off until
we get to the field we’re looking for, and in the latter case, the access can be
made relative to where the tuple lives.
To facilitate the arbitrary-target case, the procedure also returns where
exactly the expression ended up being stored, as a type that’s opt_target
without the Any
variant.
Tail-call optimization
Passing an optional target makes tail-call optimization very natural – when
we’re compiling a function, we set up the target of the function body to a
special return
local, which is the location where the caller allocated space
for the return value.
During the compilation of a function, if we see a call to the same function, and
whose target is the return
local, we know that we can perform a tail-call
optimization! Thanks to lambda sets, the destination of a function call is in the function
type, so only the type of the function being applied needs to be examined to
understand whether the optimization can be made.
To actually perform the optimization, compile the call argument into the
location of the argument in the current function frame, reset the stack pointer
to the frame pointer, and jump back up to the top of the function. See, I
promised a use of SpRestoreFp
!
Unexplored extensions
Thanks for making it this far! You must be pretty interested in this kind of
stuff. There are various natural extensions to this project, which may make for
interesting projects:
- Make these fibers do something interesting. Our language and compiler is
enough to show off that the implementation of fibers works, but doesn’t
actually demonstrate its use in an interesting one. One idea would be to add
support for making non-blocking network requests and show an example of
cooperative multitasking where multiple fibers fetch many webpages. - Implement an affine type system to enforce the one-shot behavior of our fibers
at compile-time. - Design and implement an analysis to detect what spawned expressions will never
yield. Those expressions can be optimized to happen as direct evaluations,
rather than needing to happen on a child fiber. - Design an efficient scheme for multi-resumable fibers.
- Design a system for implicitly-yielding userland threads, and a preempting
runtime scheduler of userland threads. Modify the VM and compiler to support
this system. - Implement additional optimizations over the virtual machine, and optimizations
of the bytecode in the compiler. - My virtual machine is implemented in OCaml (and compiled to JavaScript via JSOO).
Using a tighter language, like C/Rust/Zig/whatever, for the VM (and compiling
to WASM) may result in a much faster machine.
Playground & Implementation
I hope you enjoyed this post, and learned something. You can find my implementation
of the language compiler and VM on GitHub.
As you saw previously in this post, I’ve also made a playground for the
language, embdded again below.
Ad: Are you interested in compiling structural subtypes?
If you know of ways, or are interested in, compiling languages based on
structural subtyping to non-uniform representations,
please email me! This has been an ongoing project of mine
and I hope to write about it soon, and would love to chat through with others
interested.
In particular, the idea here is to combine
- the programming flexibility of structural subtyping
- (non-principal, possibly incomplete) type inference, a-la MLsub
- row polymorphism and polymorphic variants
to design subtyping-based source languages with compilation to target languages
that
- do not support runtime object polymorphism (i.e. all procedures are monomorphized)
- have non-uniform, unboxed representations
- do not require introducing implicit conversions at re-binding or usage sites
Leave A Comment