Cyber 0.3 - New JIT compiler, Embed API, and more.

v0.3 Release

New JIT compiler, Embed API, and more.

December 10, 2023

This release includes the following:

A novel JIT compiler.

Operator overloading.

A novel JIT compiler.

If you didn't already know, Cyber has arguably the fastest VM for a higher level language. Here are various benchmarks that showcase this on a Macbook Pro M2.

The need for JIT.

Having a fast VM is nice for embedded applications, but a JIT would be even better for scripting on desktops and servers. Plus, it's always fun to make things faster.

The results.

Let's first take a look at the results of Cyber's JIT compiler. On the fibonacci benchmark, it yielded a 4x speed up from the VM! Here is a comparison with other JIT runtimes:

Recursive Fibonacci (JIT) source

Showing script time (orange), load time (gray), and peak memory usage.

cyber

5ms

2.9 MB

luajit

5ms

1.6 MB

luau

20ms

2.3 MB

java

3ms

31ms

38.3 MB

node

6ms

35ms

34.0 MB

ruby-yjit

13ms

30ms

30.1 MB

Edit: Someone mentioned that Cyber's performance falls off for higher fib inputs. This is because Cyber starts with a lower initial stack size, and at some point it has to realloc its virtual stack. By increasing the stack size, the performance numbers will scale properly. The initial stack size has to be changed from a local build since it is not currently configurable from the command line.

Cyber's JIT was able to match luajit's performance! Java does so many optimizations, it's essentially in the range of a C compiled program (for this particular benchmark). Considering how much effort was put into creating Cyber's JIT, the results exceeded expectations. So how was it built?

The criteria when designing Cyber's JIT compiler included:

Making no compromises to the VM interpreter.
Very fast to compile. Ideally similar to the bytecode compiler.
Reasonable performance for a non-optimizing JIT compiler.

Repurposing bytecode to build a JIT.

Precompiled machine code is already very fast. What if we could cut out machine code segments for each bytecode from the binary and stitch them together at runtime? It would bypass bytecode dispatch overhead. This might not sound like much, but consider that each bytecode can have multiple operands as input. Removing the need to read operands from memory saves cpu cycles.

This is not a new idea. In a recent publication by Haoran Xu and Fredrik Kjolstad, they proposed a technique called copy-and-patch where machine code templates (stencils) were generated beforehand and patched at runtime by filling in holes recorded in object relocation entries.

However, it did not address architectures like arm64 where instructions are limited to 4 bytes and can not directly encode constants such as NaN tagged values and pointer addresses. Cyber solves this problem by relying on a small assembler and inserting instructions in holes rather than patching them in place.

Burning constants.

By compiling each bytecode segment with clang's musttail attribute, each stencil is guaranteed to map the same physical registers to each input operand. This allows the assembler to generate any instruction(s) as long as the final value ends up in an operand's register. Typically, a VM relies on stack slots to load operands, but now the assembler can also load constants directly into the operands.

Fast compilation.

A Cyber script was written to automate the creation of stencils with libLLVM. Since stencils are pregenerated, a bulk of the JIT compilation process is performing memcpys. This makes compilation fast. It is so fast that it can replace bytecode generation without a noticeable difference in load time. This means that running scripts can be faster by default with the trade-off of additional memory for the JIT code.

Proof of concept.

As you can see from the results above, copy-and-patch is a compelling JIT technique. It is fast to compile, has great performance, and is easier to develop and maintain. However, this is only a proof of concept and Cyber's JIT won't work on arbitrary code.

To try out the JIT on your computer, clone the repo and run the command:

cyber -jit test/bench/fib/fib.cy

Currently, there are backends for arm64 and x64. For this release, Windows x64 JIT support was excluded. However, it does work on WSL.

Cycle detection.

Cyber's heap memory is managed by ARC. It's a nice alternative to a pure GC solution because freeing memory is distributed and doesn't require a GC. However, the downside is dealing with reference cycles.

Dealing with cycles.

Eventually, Cyber will support weak references which is one solution to reference cycles. For now, we have implemented a mark-sweep that simply detects abandoned objects and frees them. This can be manually invoked with `performGC`.

Doing less work.

The interesting bit about Cyber's mark-sweep is how it can skip work by observing whether the object is non-cyclable. A non-cyclable object means that it has been proven by the compiler that it can never contain a reference cycle. Each object created is encoded with a non-cyclable bit into its NaN tagged value. This allows the mark-sweep to skip visits without needing to dereference an object pointer.

Embed API

Cyber now has an Embed API which is defined in cyber.h. Details on how to use it can be found in the embedding docs.

The Embed API was used to build Cyber's CLI and core library.

Why embed Cyber?

With the Embed API, Cyber is now an interesting alternative to Lua and other embeddable languages. Cyber is fast as a VM interpreter. Here is one benchmark that demonstrates this:

Recursive Fibonacci (VM) source

This tests how fast function calls are with a growing call stack.

cyber

19ms

2.9 MB

luajit

21ms

1.4 MB

wasm3

31ms

1.4 MB

luau

34ms

2.1 MB

lua

39ms

1.3 MB

quickjs

57ms

1.9 MB

wren

71ms

1.5 MB

java

44ms

31ms

35.2 MB

python3

70ms

15ms

10.2 MB

ruby

54ms

31ms

29.6 MB

node

56ms

39ms

31.9 MB

Showing script time (orange), load time (gray), and peak memory usage.

It's worth mentioning that this benchmark and others were measured on modern hardware (Macbook Pro M2) which dilutes the difference since a VM is often limited by how fast the cpu can read/write to memory. On older hardware the performance difference is even greater.

So performance is nice, but what about the language? Cyber offers a modern syntax that's easy to learn. Since both dynamic and static types are supported, it can be accessible to a user with a particular programming style.

cbindgen.

cbindgen.cy has been upgraded to use `libclang` to automatically generate Cyber bindings to C libraries by just looking at a C header file.

Since libclang does not expose the final expansion for each macro definition, `cbindgen.cy` creates a temporary C++ header file which uses `auto` to evaluate each macro and resolve its type. It's a bit hacky but it saves us the trouble of manually parsing C macros.

Some examples of bindings created with `cbindgen.cy` include: Raylib and LLVM. The Raylib repo also comes with sample games that can be run without any dependencies.

Type system.

Why dynamic types?

A scripting language benefits from dynamic types because it reduces friction and is easier to use as a beginner. It's also convenient to interact with external data and APIs when the types are not known at compile-time.

Optional static typing.

Cyber also wants static types for writing code that can be performant and sustainable, but it was unclear how it would be introduced into the language.

After a few design iterations, Cyber has settled with optional static typing by default. This means that dynamic code can be written without feeling the weight of compile-time constraints. However, we also want the benefits that fully static-typed code provides, so in the future this default can be changed from a command line flag or configuration.

We also found it to be more ergonomic to separate dynamic and static variable declarations. When scripting with a preference for dynamic types, always use the `my` declaration and opt into `var` when desired:

my a = 123

When scripting with a preference for static types, always use the `var` declaration and opt into `my` when desired:

var a = 123

Since `var` only ever refers to static types, using it without a type specifier infers the static type from the right-hand side. If the right-hand side is `dynamic` then the type inferred is `any`. Unlike `dynamic`, `any` still requires casting before it can be copied to a narrowed destination.

At this time, it is recommended to use dynamic types. Static types will only feel ergonomic when optionals, union types, and generics are implemented. Until then, it may be necessary to insert additional type casts to satisfy compile-time constraints.

Operator overloading.

Operators can now be overridden with method declarations. This can be useful to encapsulate common operations on objects and simplify the syntax at the call-site.

type Vec2 object:
    var x float
    var y float

    func '$infix+'(o Vec2) Vec2:
        return [Vec2
            x: x + o.x,
            y: y + o.y,
        ]

var a = [Vec2 x: 1, y: 2]
var b = a + [Vec2 x: 3, y: 4]

Compiler rewrite.

The initial source code of Cyber's compiler wasn't pleasant to look at. It has since been rewritten for readability. This was also an opportunity to create an IR stage since Cyber's static types and language choices warranted a second compiler pass.

Consuming the result of semantic analysis allows developing multiple backends easier. This has proven to be useful in creating our JIT backend and will come in handy later when we add more backends.

WASI Target.

The WASI target is now automatically built for releases and Cyber tip. With WASI, the Cyber CLI can be built once and run on any platform that has a WASM/WASI runtime. Enabling this was surprisingly simple using Zig's build system.

Doc gen.

Docs are now automatically generated for the builtin and std modules. The generation is specific to the markdown template used in Cyber's documentation but it wouldn't be difficult to extend this to user scripts and expose it as a CLI tool.

Afterword.

Thanks for reading! Cyber is still evolving so any input or concerns would be greatly appreciated. Be sure to file a GitHub issue or join our Discord. You might also want to follow the Twitter account.