Cyber
v0.3 Release

New JIT compiler, Embed API, and more.

December 10, 2023

This release includes the following:

  • A novel JIT compiler.
  • Cycle detector.
  • Embed API.
  • cbindgen.
  • Type system.
  • Operator overloading.
  • Compiler rewrite.
  • WASI target.
  • Doc gen.
  • Afterword.
  •  

    A novel JIT compiler.

    If you didn't already know, Cyber has arguably the fastest VM for a higher level language. Here are various benchmarks that showcase this on a Macbook Pro M2.

    The need for JIT.

    Having a fast VM is nice for embedded applications, but a JIT would be even better for scripting on desktops and servers. Plus, it's always fun to make things faster.

    The results.

    Let's first take a look at the results of Cyber's JIT compiler. On the fibonacci benchmark, it yielded a 4x speed up from the VM! Here is a comparison with other JIT runtimes:

    Recursive Fibonacci (JIT) source
    Showing script time (orange), load time (gray), and peak memory usage.
    cyber
    5ms
     
    2.9 MB
    luajit
    5ms
     
    1.6 MB
    luau
    20ms
     
    2.3 MB
    java
    3ms
    31ms
    38.3 MB
    node
    6ms
    35ms
    34.0 MB
    ruby-yjit
    13ms
    30ms
    30.1 MB
    Edit: Someone mentioned that Cyber's performance falls off for higher fib inputs. This is because Cyber starts with a lower initial stack size, and at some point it has to realloc its virtual stack. By increasing the stack size, the performance numbers will scale properly. The initial stack size has to be changed from a local build since it is not currently configurable from the command line.

    Cyber's JIT was able to match luajit's performance! Java does so many optimizations, it's essentially in the range of a C compiled program (for this particular benchmark). Considering how much effort was put into creating Cyber's JIT, the results exceeded expectations. So how was it built?

    The criteria when designing Cyber's JIT compiler included:

    1. Making no compromises to the VM interpreter.
    2. Very fast to compile. Ideally similar to the bytecode compiler.
    3. Reasonable performance for a non-optimizing JIT compiler.

    Repurposing bytecode to build a JIT.

    Precompiled machine code is already very fast. What if we could cut out machine code segments for each bytecode from the binary and stitch them together at runtime? It would bypass bytecode dispatch overhead. This might not sound like much, but consider that each bytecode can have multiple operands as input. Removing the need to read operands from memory saves cpu cycles.

    This is not a new idea. In a recent publication by Haoran Xu and Fredrik Kjolstad, they proposed a technique called copy-and-patch where machine code templates (stencils) were generated beforehand and patched at runtime by filling in holes recorded in object relocation entries.

    However, it did not address architectures like arm64 where instructions are limited to 4 bytes and can not directly encode constants such as NaN tagged values and pointer addresses. Cyber solves this problem by relying on a small assembler and inserting instructions in holes rather than patching them in place.

    Burning constants.

    By compiling each bytecode segment with clang's musttail attribute, each stencil is guaranteed to map the same physical registers to each input operand. This allows the assembler to generate any instruction(s) as long as the final value ends up in an operand's register. Typically, a VM relies on stack slots to load operands, but now the assembler can also load constants directly into the operands.

    Fast compilation.

    A Cyber script was written to automate the creation of stencils with libLLVM. Since stencils are pregenerated, a bulk of the JIT compilation process is performing memcpys. This makes compilation fast. It is so fast that it can replace bytecode generation without a noticeable difference in load time. This means that running scripts can be faster by default with the trade-off of additional memory for the JIT code.

    Proof of concept.

    As you can see from the results above, copy-and-patch is a compelling JIT technique. It is fast to compile, has great performance, and is easier to develop and maintain. However, this is only a proof of concept and Cyber's JIT won't work on arbitrary code.

    To try out the JIT on your computer, clone the repo and run the command:

    cyber -jit test/bench/fib/fib.cy
    Currently, there are backends for arm64 and x64. For this release, Windows x64 JIT support was excluded. However, it does work on WSL.

    Cycle detection.

    Cyber's heap memory is managed by ARC. It's a nice alternative to a pure GC solution because freeing memory is distributed and doesn't require a GC. However, the downside is dealing with reference cycles.

    Dealing with cycles.

    Eventually, Cyber will support weak references which is one solution to reference cycles. For now, we have implemented a mark-sweep that simply detects abandoned objects and frees them. This can be manually invoked with `performGC`.

    Doing less work.

    The interesting bit about Cyber's mark-sweep is how it can skip work by observing whether the object is non-cyclable. A non-cyclable object means that it has been proven by the compiler that it can never contain a reference cycle. Each object created is encoded with a non-cyclable bit into its NaN tagged value. This allows the mark-sweep to skip visits without needing to dereference an object pointer.

    Embed API

    Cyber now has an Embed API which is defined in cyber.h. Details on how to use it can be found in the embedding docs.

    The Embed API was used to build Cyber's CLI and core library.

    Why embed Cyber?

    With the Embed API, Cyber is now an interesting alternative to Lua and other embeddable languages. Cyber is fast as a VM interpreter. Here is one benchmark that demonstrates this:

    Recursive Fibonacci (VM) source
    This tests how fast function calls are with a growing call stack.
    cyber
    19ms
     
    2.9 MB
    luajit
    21ms
     
    1.4 MB
    wasm3
    31ms
     
    1.4 MB
    luau
    34ms
     
    2.1 MB
    lua
    39ms
     
    1.3 MB
    quickjs
    57ms
     
    1.9 MB
    wren
    71ms
     
    1.5 MB
    java
    44ms
    31ms
    35.2 MB
    python3
    70ms
    15ms
    10.2 MB
    ruby
    54ms
    31ms
    29.6 MB
    node
    56ms
    39ms
    31.9 MB
    Showing script time (orange), load time (gray), and peak memory usage.

    It's worth mentioning that this benchmark and others were measured on modern hardware (Macbook Pro M2) which dilutes the difference since a VM is often limited by how fast the cpu can read/write to memory. On older hardware the performance difference is even greater.

    So performance is nice, but what about the language? Cyber offers a modern syntax that's easy to learn. Since both dynamic and static types are supported, it can be accessible to a user with a particular programming style.

    cbindgen.

    cbindgen.cy has been upgraded to use `libclang` to automatically generate Cyber bindings to C libraries by just looking at a C header file.

    Since libclang does not expose the final expansion for each macro definition, `cbindgen.cy` creates a temporary C++ header file which uses `auto` to evaluate each macro and resolve its type. It's a bit hacky but it saves us the trouble of manually parsing C macros.

    Some examples of bindings created with `cbindgen.cy` include: Raylib and LLVM. The Raylib repo also comes with sample games that can be run without any dependencies.

    Type system.

    Why dynamic types?

    A scripting language benefits from dynamic types because it reduces friction and is easier to use as a beginner. It's also convenient to interact with external data and APIs when the types are not known at compile-time.

    Optional static typing.

    Cyber also wants static types for writing code that can be performant and sustainable, but it was unclear how it would be introduced into the language.

    After a few design iterations, Cyber has settled with optional static typing by default. This means that dynamic code can be written without feeling the weight of compile-time constraints. However, we also want the benefits that fully static-typed code provides, so in the future this default can be changed from a command line flag or configuration.

    We also found it to be more ergonomic to separate dynamic and static variable declarations. When scripting with a preference for dynamic types, always use the `my` declaration and opt into `var` when desired:

    my a = 123

    When scripting with a preference for static types, always use the `var` declaration and opt into `my` when desired:

    var a = 123
    Since `var` only ever refers to static types, using it without a type specifier infers the static type from the right-hand side. If the right-hand side is `dynamic` then the type inferred is `any`. Unlike `dynamic`, `any` still requires casting before it can be copied to a narrowed destination.

    At this time, it is recommended to use dynamic types. Static types will only feel ergonomic when optionals, union types, and generics are implemented. Until then, it may be necessary to insert additional type casts to satisfy compile-time constraints.

    Operator overloading.

    Operators can now be overridden with method declarations. This can be useful to encapsulate common operations on objects and simplify the syntax at the call-site.

    type Vec2 object:
        var x float
        var y float
    
        func '$infix+'(o Vec2) Vec2:
            return [Vec2
                x: x + o.x,
                y: y + o.y,
            ]
    
    var a = [Vec2 x: 1, y: 2]
    var b = a + [Vec2 x: 3, y: 4]
    Read more about operator overloading.

    Compiler rewrite.

    The initial source code of Cyber's compiler wasn't pleasant to look at. It has since been rewritten for readability. This was also an opportunity to create an IR stage since Cyber's static types and language choices warranted a second compiler pass.

    Consuming the result of semantic analysis allows developing multiple backends easier. This has proven to be useful in creating our JIT backend and will come in handy later when we add more backends.

    WASI Target.

    The WASI target is now automatically built for releases and Cyber tip. With WASI, the Cyber CLI can be built once and run on any platform that has a WASM/WASI runtime. Enabling this was surprisingly simple using Zig's build system.

    Doc gen.

    Docs are now automatically generated for the builtin and std modules. The generation is specific to the markdown template used in Cyber's documentation but it wouldn't be difficult to extend this to user scripts and expose it as a CLI tool.

    Afterword.

    Thanks for reading! Cyber is still evolving so any input or concerns would be greatly appreciated. Be sure to file a GitHub issue or join our Discord. You might also want to follow the Twitter account.